Site Reliability Engineer III

JPMorgan Chase JPMorgan Chase · Banking · Hyderabad, Telangana, India · Consumer & Community Banking

Site Reliability Engineer III at JPMorgan Chase focused on modernizing complex systems, configuring, monitoring, and optimizing applications and infrastructure. The role involves guiding design, implementing CI/CD pipelines, ensuring availability, reliability, and scalability, and implementing infrastructure as code. Requires strong experience in Data Warehousing (Oracle/Snowflake), SQL/PLSQL, Python/Java for data handling, cloud environments, SRE principles, and observability tools.

What you'd actually do

  1. Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate
  2. Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines
  3. Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications
  4. Implements infrastructure, configuration, and network as code for the applications and platforms in your remit
  5. Collaborates with technical experts, key stakeholders, and team members to resolve complex problems

Skills

Required

  • Formal training or certification on software engineering concepts
  • 5+ years applied experience
  • Minimum 9 years of overall experience
  • at least 7 years as a software engineer and/or site reliability engineer focused on Data Warehousing (Oracle/Snowflake), SQL/PLSQL, and large data movement in cloud environments
  • Advanced proficiency in Python and/or Java for large-scale data handling and migration
  • Hands-on experience with platforms and applications hosted on public, private, or hybrid cloud infrastructures
  • Formal training or certification in site reliability engineering (SRE) concepts
  • at least 3 years of applied SRE experience
  • Expertise in observability practices, including white and black box monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, and Splunk
  • Strong understanding of site reliability culture and principles, with practical experience implementing SRE within applications or platforms
  • Proficient knowledge of software applications and technical processes in areas such as Cloud, Artificial Intelligence, and Machine Learning
  • Experience with continuous integration and continuous delivery (CI/CD) tools, including Jenkins, GitLab, and Terraform
  • Skilled in containerization and orchestration technologies such as Docker, Kubernetes, and ECS
  • Familiarity with troubleshooting common networking technologies and issues
  • demonstrated ability to work collaboratively in large teams, communicate effectively, address roadblocks proactively, and implement innovative solutions while staying current with emerging technologies

Nice to have

  • Proficiency in one or more technology domains, may be a cross-domain expert able to solve complex and mission critical problems within a business or across the firm
  • Adept in the development of automated tools, systems, and services in multiple technology domains
  • Working knowledge of infrastructure components. (E.g. routers, load balancers, cloud products, container systems, compute, storage and networks)
  • Excellent debugging and trouble shooting skills
  • Proficiency in service-level changes to a system and troubleshooting components
  • Monitoring tools and log analysis tools to manage operations

What the JD emphasized

  • 7 years as a software engineer and/or site reliability engineer focused on Data Warehousing (Oracle/Snowflake), SQL/PLSQL, and large data movement in cloud environments
  • at least 3 years of applied SRE experience