Sre Iii- Data & Aws

JPMorgan Chase JPMorgan Chase · Banking · LONDON, United Kingdom · Corporate Sector

Site Reliability Engineer focused on managing and supporting an AWS Databricks platform for data and analytics teams, including data engineering, Data Science/ML, and application teams. Responsibilities include platform design, setup, monitoring, automation, and incident response, with a strong emphasis on SRE best practices and operational excellence.

What you'd actually do

  1. Maintains a managed AWS Databricks platform, and provides engineering and operational support for the platform to application teams.
  2. Performs platform design, set-up and configuration, workspace administration, resource monitoring, providing engineering support to data engineering teams, Data Science/ML, and Application/integration teams.
  3. Leads evaluation sessions with external vendors, startups, and internal teams to drive outcomes-oriented probing of architectural designs, technical credentials, and applicability for use within existing systems and information architecture.
  4. Drives continuous improvement in system observability, alerting, and capacity planning.
  5. Collaborates with engineering and data teams to optimize infrastructure and deployment processes, focusing on automation and operational excellence.

Skills

Required

  • Formal training or certification on software engineering concepts and 10+ years applied experience.
  • Extensive experience with AWS Databricks platform administration and engineering support is a MUST.
  • Strong understanding of SRE principles, including SLIs, SLOs, error budgets, and incident management.
  • Experience with monitoring tools, automation frameworks, and CI/CD pipelines.
  • Proficient in Python application program development with use of automated unit testing.
  • Experience with terraform development and understanding of terraform enterprise.
  • Experience in delivering system design, application development, testing, and operational stability.
  • Knowledge of Big Data distributed compute frameworks like Spark, Glue, MapReduce etc.
  • Excellent troubleshooting, analytical, and communication skills.
  • Experience in Data pipelines using Spark.
  • Exposure to AWS & Databricks Platform administration.
  • Knowledge of containerization (Docker, Kubernetes) and orchestration.
  • Familiarity with distributed systems and large-scale data processing.

Nice to have

  • Experience in Data pipelines using Spark.
  • Exposure to AWS & Databricks Platform administration.
  • Knowledge of containerization (Docker, Kubernetes) and orchestration.
  • Familiarity with distributed systems and large-scale data processing.

What the JD emphasized

  • Extensive experience with AWS Databricks platform administration and engineering support is a MUST.