Data Scientist Lead - Vice President

JPMorgan Chase JPMorgan Chase · Banking · Plano, TX +1 · Corporate Sector

Lead Data Scientist at JPMorgan Chase, responsible for building and deploying secure, scalable, production-grade AI/ML solutions, with a focus on generative AI applications and operating within regulated environments. This role involves applied research, model development, MLOps, and mentoring.

What you'd actually do

  1. Perform data exploration and analysis to assess distributions, data quality issues, leakage risks, missingness, bias, and anomalies, and define data readiness criteria.
  2. Conduct applied research to evaluate modeling approaches (classical machine learning, deep learning, and generative AI where relevant), and document findings, trade-offs, and recommendations.
  3. Build baseline models and iteratively improve performance through feature engineering, error analysis, and interpretability techniques.
  4. Design and deploy generative AI applications, including fine-tuning, Retrieval-Augmented Generation systems, and agentic AI frameworks.
  5. Build and maintain automated machine learning workflows for training, evaluation, packaging, deployment, and monitoring with a focus on reliability and reproducibility.

Skills

Required

  • Python for data science and modeling
  • PyTorch, TensorFlow, PyTorch Lightning, or scikit-learn
  • AWS cloud development
  • Natural Language Processing (NLP)
  • Large Language Models (LLMs)
  • Prompt Engineering
  • Embeddings
  • Retrieval Patterns
  • API development (FastAPI)
  • Containerized ML service deployment (Docker, Kubernetes, ECS, EKS)
  • AWS services (S3, IAM, CloudWatch, ECS, SageMaker, Bedrock)
  • Infrastructure-as-code (Terraform)
  • Data exploration and validation (PySpark, pandas, Dask)

Nice to have

  • Delivering AI/ML solutions in a highly regulated environment
  • AWS certification
  • LLM evaluation methods (quality, safety, guardrails, reliability)
  • Model serving patterns
  • Distributed compute platforms (EMR, Databricks)

What the JD emphasized

  • secure, scalable, and reliable machine learning solutions
  • production-grade AI systems
  • regulated environments
  • strong documentation and operational readiness practices
  • data quality issues
  • leakage risks
  • bias
  • anomalies
  • reliability and reproducibility

Other signals

  • end-to-end AI and machine learning solutions
  • production-grade AI systems
  • generative AI applications
  • regulated environments