Site Reliability Engineer - Senior Associate (troubleshooting & Python)

JPMorgan Chase JPMorgan Chase · Banking · Ciudad Autónoma de Buenos Aires, Argentina · Corporate Sector

Senior Site Reliability Engineer focused on troubleshooting, Python, and maintaining/optimizing applications and infrastructure within a Cyber & Technology Controls group. Responsibilities include designing deployment approaches, implementing infrastructure as code, monitoring service levels, and leading incident response.

What you'd actually do

  1. Guides and assists others in the areas of building appropriate level designs and gaining consensus from peers where appropriate
  2. Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines
  3. Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications
  4. Implements infrastructure, configuration, and network as code for the applications and platforms in your remit
  5. Leading major incident response, root cause analysis, blameless postmortems

Skills

Required

  • Python
  • Java/Spring Boot
  • Cloud
  • Observability tools (Grafana, Dynatrace, Prometheus, Datadog, Splunk)
  • CI/CD tools (Jenkins, GitLab, Terraform)
  • Container orchestration (ECS, Kubernetes, Docker)
  • SLO/SLI definition
  • Chaos engineering
  • Disaster recovery planning
  • Networking troubleshooting

Nice to have

  • AWS / Azure / GCP
  • Terraform
  • GitHub
  • DevOps automation
  • Jira, Confluence, ServiceNow, Netcool
  • Team leadership and mentoring

What the JD emphasized

  • Over 5 years of experience working on support of products / infrastructure
  • Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
  • Proficient in at least one programming language such as Python or Java/Spring Boot
  • Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.)
  • Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
  • Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform
  • Familiarity with container and container orchestration such as ECS, Kubernetes, and Docker
  • Experience on SLO/SLI definition, chaos engineering (Gremlin, Chaos Monkey), disaster recovery planning