Senior Lead Software Engineer- AI Platform Engineer

JPMorgan Chase JPMorgan Chase · Banking · Palo Alto, CA +1 · Corporate Sector

Senior Lead Software Engineer focused on building and optimizing AI/ML infrastructure platforms within a large enterprise. The role involves architecting cloud infrastructure, implementing CI/CD for ML workloads, and collaborating with AI teams to meet their computational needs. Requires strong software engineering, cloud, and foundational ML knowledge.

What you'd actually do

  1. Architect and deploy secure, scalable cloud infrastructure platforms optimized for AI and machine learning workloads.
  2. Collaborate with AI teams to translate computational needs into infrastructure requirements.
  3. Monitor, manage, and optimize cloud resources for performance and cost efficiency.
  4. Design and implement continuous integration and delivery pipelines for machine learning workloads.
  5. Develop automation scripts and infrastructure as code to streamline deployment and management tasks.

Skills

Required

  • software engineering concepts
  • system design
  • application development
  • testing
  • operational stability
  • Python
  • Go
  • Java
  • C#
  • Computer Science
  • Computer Engineering
  • Mathematics
  • cloud computing delivery models
  • cloud deployment models
  • machine learning concepts
  • transformer architecture
  • ML training
  • inference
  • solutions design and engineering
  • containerization
  • Docker
  • Kubernetes
  • AWS
  • Azure
  • GCP
  • Infrastructure as Code
  • cloud component architecture
  • Microservices
  • Containers
  • IaaS
  • Storage
  • Security
  • routing/switching technologies

Nice to have

  • NVIDIA GPU infrastructure software
  • DCGM
  • BCM
  • Triton Inference
  • PyTorch
  • TensorBoard
  • Prometheus
  • Grafana
  • ML Ops
  • MLflow
  • high performance computing
  • vLLM
  • Ray.io
  • Slurm
  • network architecture
  • database programming
  • SQL
  • NoSQL
  • data modeling
  • cloud data services
  • big data processing tools
  • Linux environments

What the JD emphasized

  • AI and machine learning workloads
  • machine learning workloads
  • machine learning concepts
  • NVIDIA GPU infrastructure software
  • machine learning frameworks
  • ML Ops

Other signals

  • architect and deploy secure, scalable cloud infrastructure platforms optimized for AI and machine learning workloads
  • collaborate with AI teams to translate computational needs into infrastructure requirements
  • design and implement continuous integration and delivery pipelines for machine learning workloads