Site Reliability Engineer

Visa Visa · Fintech · Cambridge, United Kingdom, United Kingdom

Visa is seeking a Site Reliability Engineer to operate and improve their ARIC Risk Hub SaaS platform, ensuring reliability, scalability, and security. The role involves deploying, monitoring, troubleshooting, and supporting the platform, collaborating with engineering and data science teams, and providing second-line operational support to customers. Experience with cloud infrastructure, Linux, scripting, and production-grade services is required, with preferred experience in IaC, Kubernetes, and observability tools.

What you'd actually do

  1. Operate and support production deployments of ARIC Risk Hub SaaS, including deploying, maintaining, monitoring, upgrading, and troubleshooting platform and application components.
  2. Build software and systems to manage platform infrastructure and applications.
  3. Continuously evaluate and improve technology and operational processes to increase quality, reduce costs, and improve time‑to‑market.
  4. Participate in service resilience and failure testing, including predictable and unpredictable failure scenarios.
  5. Provide second‑line operational support for SaaS customers, ensuring timely and high‑quality issue resolution.

Skills

Required

  • Experience administering cloud infrastructure or supporting cloud applications (preferably AWS).
  • Working knowledge of Linux, shell scripting, and command‑line tools.
  • Ability to write or maintain code in at least one high‑level programming language (e.g., Python).
  • Understanding of networking fundamentals (e.g., DNS, routing, firewalls).
  • Familiarity with source control systems (e.g., Git).
  • Exposure to CI/CD concepts and pipelines.
  • Familiarity with monitoring, metrics, and alerting systems.
  • Experience operating and supporting production‑grade services.
  • Ability to write clear technical documentation and follow defined operational processes.

Nice to have

  • Infrastructure as Code and configuration management experience (e.g., Terraform, SaltStack, Ansible).
  • Experience with containerization (Docker) and Kubernetes (deploying or operating services).
  • Exposure to service mesh technologies (e.g., Istio).
  • Experience building or operating cloud‑native or serverless applications.
  • Familiarity with observability and data platforms such as Prometheus, Grafana, MongoDB, Elasticsearch, Kafka, and HashiCorp Vault.
  • Understanding of application and data security fundamentals (authentication, authorization, encryption, TLS).
  • Awareness of regulated standards (e.g., PCI‑DSS, SOC2, ISO27001).