Data Scientist Lead

JPMorgan Chase JPMorgan Chase · Banking · Tampa, FL +1 · Commercial & Investment Bank

Lead a team of data scientists in building and deploying advanced AI solutions for image classification, text categorization, and intelligent data extraction from scanned documents within the healthcare provider team at JPMorgan Chase. The role involves managing the full ML lifecycle, from prototyping to production deployment on AWS EKS, with a focus on multimodal document understanding.

What you'd actually do

  1. Lead and mentor a team of data scientists in designing and executing advanced analytics and modeling projects focused on image classification, text categorization, and intelligent data extraction from scanned document images. Foster a culture of curiosity, analytical rigor, and continuous learning by developing team members in deep learning, computer vision, NLP, and document AI techniques.
  2. Define and drive the analytical strategy for document understanding use cases, identifying the optimal combination of computer vision, NLP, and multimodal approaches.
  3. Build and fine-tune multimodal document understanding and text categorization models. Leverage the interplay of textual content, spatial layout, and visual features to extract structured fields and key-value pairs from complex scanned documents, while enabling automated categorization, routing, metadata tagging, and entity extraction.
  4. Design rigorous experimentation and data quality frameworks, including A/B testing, cross-validation strategies, and statistical significance testing to evaluate model performance and hyperparameter tuning. Establish best practices for annotation quality management, training data curation, active learning strategies, and ground truth validation to ensure high-quality labeled datasets.
  5. Design, manage, and optimize the workflows involved in preparing data for machine learning model training, select statistical or Deep Learning models that are best positioned to achieve business results.
  6. Develop and deploy models using Python and AWS SageMaker, managing the full lifecycle from exploratory data analysis and prototyping through production deployment, monitoring, and performance tracking. Collaborate with data engineers and ML engineers to ensure seamless integration of analytical models into production document processing pipelines and data workflows.

Skills

Required

  • Python
  • PyTorch
  • TensorFlow
  • Hugging Face Transformers
  • AWS SageMaker
  • AWS Bedrock
  • CNN architectures
  • transformer architectures
  • OCR technologies
  • multimodal document understanding
  • image classification
  • text categorization
  • intelligent data extraction
  • ML lifecycle management
  • production deployment
  • AWS EKS
  • computer vision
  • NLP
  • document AI
  • statistics
  • mathematics
  • programming
  • probability
  • mathematical modeling
  • experimental design
  • model evaluation
  • data analysis
  • visualization
  • scikit-learn
  • OpenCV
  • pandas
  • NumPy
  • matplotlib
  • seaborn
  • transfer learning
  • feature extraction
  • spatial layout
  • visual features
  • structured fields
  • key-value pairs
  • automated categorization
  • routing
  • metadata tagging
  • entity extraction
  • A/B testing
  • cross-validation
  • statistical significance testing
  • hyperparameter tuning
  • annotation quality management
  • training data curation
  • active learning strategies
  • ground truth validation
  • data preparation
  • Deep Learning models
  • exploratory data analysis
  • prototyping
  • monitoring
  • performance tracking
  • data engineers
  • ML engineers
  • document processing pipelines
  • data workflows
  • SQL
  • Oracle databases
  • document metadata
  • extracted content
  • Java
  • Groovy
  • enterprise application codebases
  • annotation tools
  • supervised learning

Nice to have

  • Domain expertise in the healthcare industry

What the JD emphasized

  • deep proficiency in Python, PyTorch, TensorFlow, Hugging Face Transformers, AWS SageMaker/Bedrock
  • hands-on experience with CNN/transformer architectures, OCR technologies, and multimodal document understanding models
  • managing the full ML lifecycle
  • production deployment on AWS EKS
  • deep learning, computer vision, NLP, and document AI techniques
  • multimodal document understanding
  • text categorization models
  • OCR technologies
  • image preprocessing
  • AWS SageMaker
  • Amazon Bedrock
  • containerized deployments on AWS EKS

Other signals

  • leading a team
  • building advanced solutions
  • managing the full ML lifecycle
  • production deployment