Senior Site Reliability Engineer

Visa Visa · Fintech · Bengaluru, India, IN

Senior Site Reliability Engineer responsible for supporting the design, deployment, and operation of scalable, cloud-native infrastructure primarily on Amazon Web Services (AWS), with a focus on Kubernetes (EKS). The role involves Infrastructure as Code (IaC) using Terraform, CI/CD pipelines, observability solutions, and incident response.

What you'd actually do

  1. Support the design, deployment, and operation of cloud-native infrastructure on AWS, with a focus on Kubernetes (EKS)
  2. Contribute to Infrastructure as Code (IaC) implementations using Terraform to ensure consistent and reproducible environments
  3. Participate in the management and operation of Kubernetes clusters, including application deployments, scaling, monitoring, and troubleshooting
  4. Assist in building and maintaining CI/CD pipelines to enable safe and efficient software delivery
  5. Contribute to observability solutions (metrics, logging, tracing) to monitor system health and performance

Skills

Required

  • 2+ years of relevant work experience and a Bachelors degree, OR 5+ years of relevant work experience

Nice to have

  • 3 or more years of work experience with a Bachelor’s Degree or more than 2 years of work experience with an Advanced Degree (e.g. Masters, MBA, JD, MD)
  • 3–5 years of experience in Site Reliability Engineering, DevOps, or Cloud Engineering roles
  • Hands-on experience with Amazon Web Services (AWS), including core services such as compute, networking, and storage
  • Practical experience with Kubernetes (EKS or similar), including application deployment and basic cluster operations
  • Experience supporting production environments with high availability requirements
  • Familiarity with microservices architectures and containerized applications
  • Understanding of cloud security best practices and identity/access management
  • Exposure to SRE concepts such as SLIs, SLOs, and error budgets
  • Experience working in distributed or global teams
  • Experience with Infrastructure as Code tools, particularly Terraform
  • Experience with CI/CD pipelines and related tools (e.g., GitHub Actions, Jenkins, or similar)
  • Solid understanding of Linux systems and networking fundamentals
  • Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, CloudWatch)
  • Exposure to incident management, system troubleshooting, and root cause analysis
  • Basic scripting or programming knowledge (e.g., Python, Bash)