Lead Software Engineer-ai Platform Engineer

JPMorgan Chase JPMorgan Chase · Banking · Palo Alto, CA +1 · Corporate Sector

Lead Software Engineer focused on building and optimizing AI/ML infrastructure platforms, including cloud deployment, CI/CD pipelines for ML workloads, and collaboration with AI teams to meet computational needs. The role involves developing secure, scalable infrastructure, monitoring resources, and implementing automation.

What you'd actually do

  1. Develop and deploy cloud infrastructure platforms that are secure, scalable, and optimized for AI and machine learning workloads.
  2. Collaborate with AI teams to understand computational needs and translate these into infrastructure requirements.
  3. Design and implement continuous integration and delivery pipelines for machine learning workloads.
  4. Develop automation scripts and infrastructure as code to streamline deployment and management tasks.
  5. Monitor, manage, and optimize cloud resources to maximize performance and minimize costs.

Skills

Required

  • software engineering concepts
  • system design
  • application development
  • testing
  • operational stability
  • Python
  • Go
  • Java
  • C#
  • automation
  • continuous delivery
  • Software Development Life Cycle
  • cloud
  • artificial intelligence
  • machine learning
  • mobile
  • machine learning concepts
  • transformer architecture
  • ML training
  • inference
  • solutions design
  • engineering
  • containerization
  • Docker
  • Kubernetes
  • AWS
  • Azure
  • GCP
  • Infrastructure as Code
  • cloud component architecture
  • Microservices
  • Containers
  • IaaS
  • Storage
  • Security
  • routing/switching technologies

Nice to have

  • NVIDIA GPU infrastructure software
  • DCGM
  • BCM
  • Triton Inference
  • PyTorch
  • TensorBoard
  • Prometheus
  • Grafana
  • ML Ops
  • MLflow
  • high performance computing
  • ML frameworks
  • vLLM
  • Ray.io
  • Slurm
  • network architecture
  • database programming
  • SQL
  • NoSQL
  • data modeling
  • cloud data services
  • big data processing tools
  • Linux environments
  • scripting
  • administration

What the JD emphasized

  • Foundational understanding of machine learning concepts, including transformer architecture, ML training, and inference.
  • Experience in solutions design and engineering, containerization (Docker, Kubernetes), and cloud service providers (AWS, Azure, GCP).
  • Experience with Infrastructure as Code.

Other signals

  • Develop and deploy cloud infrastructure platforms that are secure, scalable, and optimized for AI and machine learning workloads.
  • Collaborate with AI teams to understand computational needs and translate these into infrastructure requirements.
  • Design and implement continuous integration and delivery pipelines for machine learning workloads.