Sr Principal Software Engineer - LLM Engineering

JPMorgan Chase JPMorgan Chase · Banking · Palo Alto, CA +1 · Commercial & Investment Bank

Senior Principal Software Engineer focused on LLM Engineering within JPMorgan Chase's Trust & Safety Fraud Prevention team. The role involves architecting, building, and optimizing model serving solutions, particularly for LLMs and GNNs, across cloud and on-premises environments. Key responsibilities include defining MLOps/LLMOps strategies, driving inference optimization for high throughput and low latency, creating reusable ML engineering frameworks, and ensuring observability, reliability, and cost efficiency in production AI workloads.

What you'd actually do

  1. Advises and leads on the strategy, architecture, and development of Model serving solutions for different model architectures including LLMs & GNNs, across cloud and on‑premises environments, aligning initiatives to business outcomes.
  2. Defines and implements MLOps and LLMOps strategies for end‑to‑end model lifecycle management, including training, versioning, deployment, monitoring, and governance.
  3. Drives optimization of Model inferencing for high throughput and low latency using quantization, model parallelism, intelligent batching, and hardware acceleration for all model architectures
  4. Creates durable, reusable software and platform frameworks to standardize ML Engineering services, enabling scale across teams and functions.
  5. Establishes best practices for automation, CI/CD, and infrastructure‑as‑code using containerization and orchestration technologies.

Skills

Required

  • software engineering concepts
  • AI/ML engineering
  • LLMs
  • GNNs
  • model architectures
  • GPT
  • Llama
  • Falcon
  • Mistral
  • architecting and deploying LLM & GNN solutions on AWS
  • SageMaker
  • Bedrock
  • EKS
  • Azure ML
  • GCP Vertex AI
  • building LLM, GNN serving platforms
  • large-scale environments
  • building LLM inference engines
  • Triton Inference Server
  • vLLM
  • autoscaling
  • caching
  • throughput optimization
  • Python
  • optimization techniques
  • deep learning frameworks
  • PyTorch
  • TensorFlow
  • Hugging Face Transformers
  • LLMOps/MLOps
  • MLflow
  • SageMaker Pipelines
  • Kubeflow
  • inference optimization
  • distributed systems
  • high-throughput
  • low-latency applications
  • system design
  • application development
  • testing
  • operational stability
  • enterprise AI platforms
  • SRE collaboration
  • observability
  • incident response
  • SLIs/SLOs for LLM services
  • communication skills
  • influence technical and non-technical stakeholders

Nice to have

  • Master’s or PhD in Computer Science, Engineering, or a related field (or equivalent experience)
  • cloud-native experience
  • containerization (Docker)
  • orchestration (Kubernetes)
  • infrastructure-as-code (Terraform, CloudFormation)
  • security, compliance, and governance for AI/ML deployments in regulated environments
  • trust and safety
  • fraud prevention domains
  • payments platforms
  • contributions to open-source LLM projects
  • peer-reviewed research
  • presenting at industry conferences
  • leading technical communities
  • hardware acceleration strategies
  • GPUs
  • TPUs
  • specialized inference runtimes
  • java based applications

What the JD emphasized

  • 8+ years of AI/ML engineering experience with significant expertise in LLMs, GNNs and other model architectures (e.g., GPT, Llama, Falcon, Mistral).
  • Demonstrated success architecting and deploying LLM & GNN solutions on AWS (e.g., SageMaker, Bedrock, EKS) at enterprise scale; experience with Azure ML or GCP Vertex AI.
  • Experience building LLM, GNN serving platforms in large‑scale environments typical of major tech firms.
  • Hands‑on experience building LLM inference engines using Triton Inference Server and vLLM, including autoscaling, caching, and throughput optimization.
  • Expertise in inference optimization and distributed systems for large models focused on high‑throughput, low‑latency applications.

Other signals

  • LLM Engineering
  • Model Serving
  • MLOps/LLMOps
  • Inference Optimization