Lead, Site Reliability Engineer

Mastercard Mastercard · Fintech · O Fallon, MO +1 · Engineering

Lead Site Reliability Engineer responsible for operational resilience through observability, automation, and platform engineering, with a focus on F5 load balancer platforms. Drives reliability by identifying risks, improving telemetry, enhancing automation, and mentoring engineers.

What you'd actually do

  1. Serve as the subject matter expert for load balancer platforms, with a primary focus on F5 technologies, and improve platform reliability, scalability, and operability across the enterprise
  2. Drive proactive reliability engineering by identifying systemic risks, recurring failure patterns, and architectural opportunities to strengthen resilience and performance
  3. Lead observability improvements through telemetry, dashboards, alerting, and monitoring practices using tools such as Splunk and Dynatrace
  4. Develop and enhance automation, CI/CD integrations, and DevOps practices that reduce manual effort and improve operational efficiency
  5. Partner with Architecture, Load Balancer Engineering, Operations, and global SRE teams to influence standards, roadmaps, troubleshooting approaches, and end-to-end system design
  6. Act as a senior escalation point for complex incidents, lead root cause analysis, and mentor engineers through shared documentation, runbooks, and best practices

Skills

Required

  • Site Reliability Engineering
  • platform engineering
  • infrastructure operations
  • F5 load balancer platforms
  • observability solutions (logs, metrics, traces)
  • telemetry
  • Python, Go, Bash, or similar scripting languages
  • Linux/Unix systems
  • networking
  • cloud and hybrid infrastructure
  • highly available system design
  • DevOps practices
  • CI/CD pipelines
  • automation
  • container-based deployments
  • troubleshooting
  • root cause analysis