Senior Lead Software Engineer- AI Platform Engineer

JPMorgan Chase JPMorgan Chase · Banking · San Francisco, CA +1 · Corporate Sector

Senior Lead Software Engineer role focused on building and optimizing AI/ML infrastructure platforms within a financial services company. The role involves architecting cloud infrastructure, managing ML workloads, CI/CD pipelines, and collaborating with AI teams to meet computational needs. Requires strong software engineering, Kubernetes, cloud, and foundational ML knowledge.

What you'd actually do

  1. Provide technical guidance and direction to support business objectives, collaborating with technical teams, contractors, and vendors.
  2. Develop secure, high-quality production code, and review and debug code written by others.
  3. Influence product design, application functionality, and technical operations through informed decision-making.
  4. Advocate for firm wide frameworks, tools, and practices within the Software Development Life Cycle.
  5. Promote a culture of diversity, equity, inclusion, and respect within the team.

Skills

Required

  • software engineering concepts
  • system design
  • application development
  • testing
  • operational stability
  • Kubernetes
  • containerization (Docker)
  • Python
  • Go
  • Java
  • C#
  • cloud computing delivery models (IaaS, PaaS, SaaS)
  • deployment models (Public, Private, Hybrid Cloud)
  • machine learning concepts
  • transformer architecture
  • ML training
  • inference
  • Infrastructure as Code
  • cloud component architecture
  • Microservices
  • IaaS
  • Storage
  • Security
  • routing/switching technologies

Nice to have

  • NVIDIA GPU infrastructure software (e.g., DCGM, BCM, Dynamo Inference)
  • Prometheus
  • Grafana
  • ML Ops
  • MLflow
  • high performance computing
  • ML frameworks (e.g., vLLM, Ray.io, Slurm)
  • network architecture
  • database programming (SQL/NoSQL)
  • data modeling
  • cloud data services
  • big data processing tools
  • Linux environments (scripting and administration)

What the JD emphasized

  • 5+ years applied experience
  • Strong experience with Kubernetes and containerization (Docker), including cluster operations and production troubleshooting.
  • Foundational understanding of machine learning concepts, including transformer architecture, ML training, and inference.
  • Experience with Infrastructure as Code.

Other signals

  • Architect and deploy secure, scalable cloud infrastructure platforms optimized for AI and machine learning workloads.
  • Collaborate with AI teams to translate computational needs into infrastructure requirements.
  • Design and implement continuous integration and delivery pipelines for machine learning workloads.
  • Foundational understanding of machine learning concepts, including transformer architecture, ML training, and inference.