Staff Site Reliability Engineer, Data Platform

Visa Visa · Fintech · Austin, TX

Staff Site Reliability Engineer responsible for designing, building, and evolving cloud-native, containerized infrastructure on Microsoft Azure to power data products and services. Focuses on platform maturity, supporting cross-functional squads, leading technical initiatives, and ensuring availability, security, scalability, and reliability of the data ecosystem. Requires deep expertise in Azure cloud architecture, infrastructure implementation, systems design, networking, databases, and modern data technologies, with hands-on experience in complex technology adoption, infrastructure automation, and high-scale distributed systems.

What you'd actually do

  1. Designing, building, and evolving cloud-native, containerized infrastructure on Microsoft Azure that powers our data products and services.
  2. Plays a critical part in advancing our platform maturity by supporting cross-functional squads, leading complex technical initiatives, and ensuring the availability, security, scalability, and reliability of our data ecosystem.
  3. Bring deep expertise in Azure cloud architecture, Azure infrastructure implementation, systems design, networking, databases, and modern data technologies.
  4. Contribute hands-on experience with complex technology adoption, infrastructure automation, and high-scale distributed systems, with a strong emphasis on building and operating secure, resilient, and scalable solutions in Microsoft Azure environments.
  5. Architecting, implementing, and optimizing Azure-based platforms and services, including cloud networking, compute, storage, identity and access management, observability, and container orchestration.

Skills

Required

  • Azure cloud architecture
  • Azure infrastructure implementation
  • systems design
  • networking
  • databases
  • modern data technologies
  • infrastructure automation
  • high-scale distributed systems
  • secure, resilient, and scalable solutions
  • cloud networking
  • compute
  • storage
  • identity and access management
  • observability
  • container orchestration
  • enterprise-grade cloud solutions
  • hybrid-cloud patterns
  • reliability
  • security
  • operational excellence
  • Infrastructure as Code (Terraform)
  • Kubernetes
  • CI/CD systems
  • automation using Bash, Python, or Ansible-like tools
  • software engineering practices (version control, testing, code reviews, design patterns)
  • on-call processes
  • incident management
  • post-incident reviews

Nice to have

  • AWS
  • Bachelor’s degree in Computer Science, Engineering, or related field
  • Advanced degree (e.g. Masters, MBA, JD, MD)
  • PhD

What the JD emphasized

  • Advanced experience designing and operating large‑scale, cloud‑native infrastructure (AWS preferred)
  • Strong hands-on proficiency with Infrastructure as Code (Terraform)
  • Deep understanding of Kubernetes and container orchestration
  • Strong competencies in systems design, networking, distributed systems, and reliability engineering principles (SLOs, error budgets, incident response)
  • Experience with observability stacks (Prometheus, Grafana, OpenTelemetry, ELK/EFK, Datadog, or similar)
  • Proven ability to lead complex, cross-functional technical initiatives from design to production rollout.
  • Demonstrated experience driving technology adoption and platform modernization across multiple teams.