Site Reliability Engineer Iii- Kafka Platform

JPMorgan Chase JPMorgan Chase · Banking · Jersey City, NJ +1 · Corporate Sector

Site Reliability Engineer III focused on the Kafka Platform at JPMorgan Chase. This role involves configuring, maintaining, monitoring, and optimizing applications and their infrastructure, with a strong emphasis on Kafka technology, distributed systems, cloud platforms, CI/CD pipelines, and observability. The engineer will implement infrastructure as code, resolve complex problems, and contribute to technical documentation, while also participating in on-call rotations. Required skills include proficiency in SRE principles, programming languages like Java or Python, observability tools, public cloud platforms, and the Kafka ecosystem.

What you'd actually do

  1. Demonstrate deep knowledge of Kafka technology, Kafka connect framework, and distributed systems technologies, with the ability to operate in and migrate across public and private clouds.
  2. Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines
  3. Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications
  4. Implements infrastructure, configuration, and network as code for the applications and platforms in your remit.
  5. Engage in periodic on-call rotation shifts, providing client support and ensuring thorough monitoring of the platform.

Skills

Required

  • Formal training or certification on computer science and reliability concepts and 3+ years applied experience.
  • Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
  • Proficient in at least one programming language such as Java/Spring Boot, python.
  • Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.)
  • Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, etc.
  • Experience with public cloud platforms like AWS, GCP or Azure.
  • Experience with Kafka ecosystem products: Kafka, Kafka Connect, Kafka Streams.
  • Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform.
  • Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker.
  • Familiarity with troubleshooting common networking technologies and issues.

Nice to have

  • Familiarity with running Apache Flink.
  • Understanding of authentication and authorization technologies (e.g., OAUTH, Kerberos).
  • Experience with AWS cloud services and Kubernetes platform orchestration.
  • Ability to contribute to large and collaborative teams by presenting information in a logical and timely manner with compelling language and limited supervision
  • Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation
  • Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team.
  • Ability to initiate and implement ideas to solve business problems.

What the JD emphasized

  • Kafka technology
  • distributed systems
  • public and private clouds
  • automated continuous integration and continuous delivery pipelines
  • availability, reliability, scalability
  • infrastructure, configuration, and network as code
  • service level indicators
  • service level objectives
  • technical documentation
  • site reliability engineering best practices
  • on-call rotation shifts
  • Java/Spring Boot, python
  • observability
  • Grafana, Dynatrace, Prometheus, Datadog, Splunk
  • AWS, GCP or Azure
  • Kafka ecosystem products
  • Jenkins, GitLab, or Terraform
  • ECS, Kubernetes, and Docker