Sr Site Reliability Engineer

Visa Visa · Fintech · Brazil · Remote

Sr Site Reliability Engineer role focused on owning and evolving a containerized platform, including Kubernetes clusters, cloud infrastructure, networking, and service mesh. The role emphasizes SRE principles, Infrastructure-as-Code, GitOps, and operational excellence for critical workloads.

What you'd actually do

  1. Own the end‑to‑end lifecycle (design, provisioning, upgrades, maintenance, and decommissioning) of core platform components, including:
  2. Design platform components to be resilient by default, applying SRE principles such as:
  3. Lead the design and implementation of infrastructure bootstrap orchestration, including:
  4. Drive Infrastructure‑as‑Code and GitOps‑first practices to ensure:
  5. Apply and promote SRE operational excellence practices, including:

Skills

Required

  • public cloud platforms (AWS preferred, Azure also considered)
  • operating and administering Kubernetes at scale in production environments
  • container orchestration platforms
  • cloud architecture fundamentals (networking, IAM/security concepts, and reliability patterns)
  • Infrastructure as Code (Terraform preferred) and automation-first workflows
  • GitOps practices and CI/CD pipelines
  • troubleshooting skills for distributed systems, including root‑cause analysis and reliability improvements
  • observability concepts and practices (monitoring, logging, alerting, tracing)

Nice to have

  • Service Mesh technologies (Istio preferred, App Mesh or Linkerd)
  • working with critical or mission‑critical systems
  • applying SRE principles (operational readiness, incident management, runbooks, toil reduction)
  • AWS certifications

What the JD emphasized

  • Kubernetes at scale
  • critical workloads
  • SRE principles
  • Infrastructure-as-Code
  • GitOps-first practices
  • security, compliance, and internal control requirements
  • mission-critical systems