Senior Lead Software Engineering - Ai/ml Engineer

JPMorgan Chase JPMorgan Chase · Banking · LONDON, LONDON, United Kingdom · Corporate Sector

This role is for a Senior Lead Software Engineering - AI/ML Engineer focused on Site Reliability Engineering for AI/ML Data Platforms at JPMorgan Chase in London. The role involves building scalable and resilient data solutions, incident management, root cause analysis, and implementing automation. While it involves AI/ML data platforms and uses Python/PySpark for AI/ML modeling, the core function is SRE and platform engineering, not direct AI model development or research.

What you'd actually do

  1. Demonstrate expertise in application development and support across technologies such as Databricks, Snowflake, AWS, and Kubernetes
  2. Coordinate incident management coverage to ensure effective resolution of application issues
  3. Collaborate with cross-functional teams to perform root cause analysis and implement production changes
  4. Develop and support AI/ML solutions for troubleshooting and incident resolution
  5. Mentor and guide team members to foster growth and drive strategic change

Skills

Required

  • site reliability culture and principles
  • running production incident calls
  • observability
  • SLI/SLO/SLA and Error Budgets
  • Python or PySpark for AI/ML modeling
  • building automation tools
  • system design
  • resiliency
  • testing
  • operational stability
  • disaster recovery
  • risk controls
  • compliance with departmental and company-wide standards

Nice to have

  • SRE or production support role with AWS Cloud, Databricks, Snowflake
  • AWS and Databricks certifications
  • AI/ML troubleshooting and incident resolution
  • budgetary and staffing optimization
  • mentoring and guiding team members
  • communication and interpersonal skills
  • drive strategic change

What the JD emphasized

  • site reliability culture and principles
  • running production incident calls
  • observability
  • SLI/SLO/SLA and Error Budgets
  • Python or PySpark for AI/ML modeling
  • system design, resiliency, testing, operational stability, and disaster recovery