Software Engineer III - Ai/ml Deep Learning & GPU ML Serving

JPMorgan Chase JPMorgan Chase · Banking · Palo Alto, CA +1 · Commercial & Investment Bank

Software Engineer III at JPMorgan Chase focused on AI/ML Deep Learning and GPU ML Serving. The role involves developing, testing, and troubleshooting software solutions, writing production code, producing architecture artifacts, analyzing data, and optimizing deep learning models for production inference. Key responsibilities include deploying and managing GPU workloads in Kubernetes, building scalable, low-latency systems, and partnering with product teams. Requires formal training/certification, 3+ years of applied experience, proficiency in Python and ML frameworks, experience with cloud technologies (Docker, Kubernetes, EKS), ML model serving frameworks, GPU workloads in Kubernetes, and NoSQL databases. Familiarity with modern microservices architecture and leading large-scale system design is also needed. Preferred qualifications include an MS/PhD, Java proficiency, experience with graph neural networks, GPU programming, model monitoring, MLOps tools, and serving large-scale models.

What you'd actually do

  1. Optimize deep learning models for production inference, including quantization and batching.
  2. Deploy and manage GPU workloads in Kubernetes environments.
  3. Build scalable, low-latency systems using web services and APIs.
  4. Write secure, high-quality production code and maintain algorithms integrated with firm systems.
  5. Analyze and visualize large, diverse data sets to drive continuous improvement of applications and systems.

Skills

Required

  • Python
  • ML frameworks (TensorFlow, PyTorch)
  • Cloud technologies (Docker, Kubernetes, EKS)
  • ML model serving frameworks (TorchServe, TensorFlow Serving, Triton Inference Server)
  • GPU workloads in Kubernetes
  • Scalable, low-latency systems
  • NoSQL databases (Cassandra)
  • GPU resource management
  • Microservices architecture

Nice to have

  • MS/PhD in Computer Science, Machine Learning
  • Java
  • Graph neural networks
  • GPU programming (CUDA)
  • Model monitoring
  • A/B testing
  • ML observability tools
  • MLOps tools and practices (MLflow, Kubeflow, SageMaker)
  • Serving large-scale models

What the JD emphasized

  • optimize deep learning models for production inference
  • deploy and manage GPU workloads in Kubernetes
  • build scalable, low-latency systems

Other signals

  • optimize deep learning models for production inference
  • deploy and manage GPU workloads in Kubernetes
  • build scalable, low-latency systems