Lead Software Engineer - Aws - Lead Ai/ml Platform Engineer

JPMorgan Chase JPMorgan Chase · Banking · LONDON, LONDON, United Kingdom · Corporate Sector

Lead Software Engineer focused on building and deploying AI/ML platform infrastructure on AWS for model deployment at scale within a financial institution. The role involves developing APIs for automated retraining, scheduling, endpoint deployments, and autoscaling, as well as implementing model monitoring solutions, with a specific emphasis on LLM monitoring and automated issue correction. The engineer will also engage with clients for support and contribute to disaster recovery and multi-region capabilities.

What you'd actually do

  1. Build and deploy infrastructure solutions for seamless integration of control plane and client accounts
  2. Develop and implement APIs for platform functionalities such as automated retraining, scheduling, endpoint deployments, and autoscaling
  3. Design robust features to support a growing internal customer base, including multi-region and disaster recovery capabilities
  4. Architect and implement model monitoring solutions, with emphasis on LLM monitoring and automated issue correction
  5. Engage with clients to identify strategic solutions and provide deployment and debugging support

Skills

Required

  • Knowledge of AWS services and cloud-based infrastructure
  • Experience building resilient software platforms
  • Proficiency in architecting software solutions at scale
  • Ability to design solutions with strategic insight

Nice to have

  • Familiarity with monitoring tools, especially for AI/ML model monitoring
  • Proficiency in Golang
  • Experience with AWS Sagemaker for model training and deployment
  • Familiarity with Kubernetes and managing deployments to EKS
  • Knowledge of networking concepts such as Virtual Private Clouds and DNS
  • Experience working with LLMs
  • Experience with Terraform or other Infrastructure as Code tools
  • Experience in API development and design

What the JD emphasized

  • model monitoring solutions
  • LLM monitoring
  • automated issue correction

Other signals

  • model deployment at scale
  • cloud-native solutions
  • streamline production workflows
  • scale our platform
  • model monitoring solutions
  • LLM monitoring
  • automated issue correction
  • deployment and debugging support
  • platform capabilities
  • managed environments for platform operations