Lead Site Reliability Engineer

at Mastercard · Fintech · O Fallon, MO +1 · Engineering

Lead Site Reliability Engineer responsible for driving SRE and DevOps maturity, shaping reliability strategy, defining standards, and elevating operational excellence across critical platforms. Focuses on resilience, scalability, and customer trust, partnering with engineering, architecture, and security teams. Ensures platforms are highly available, observable, self-healing, secure, compliant, and operated through repeatable processes. Influences architecture, design, and delivery to embed reliability and operability, with a shift-left operational mindset. Owns availability, latency, performance, and reliability objectives, leads incident response, and champions blameless postmortems. Drives CI/CD strategy, automation adoption, and defines standards for monitoring and alerting. Partners with security and compliance teams to embed controls and ensure regulatory requirements are met. Mentors engineers and contributes to best practices.

What you'd actually do

  1. Act as a Lead-level technical authority for reliability, operability, and production readiness across multiple platforms or programs.
  2. Own and evolve availability, latency, performance, and reliability objectives for critical systems.
  3. Provide leadership for CI/CD strategy, ensuring pipelines support automated validation, risk-based gating, and safe, repeatable deployments.
  4. Define and promote standards for monitoring, alerting, SLOs, and telemetry.
  5. Partner with security, risk, and compliance teams to embed controls, auditability, and regulatory requirements into platform design and operations.

Skills

Required

  • distributed systems
  • reliability engineering
  • production operations
  • algorithms
  • data structures
  • system design
  • automation
  • troubleshooting
  • Python
  • Go
  • Bash
  • DevOps tooling
  • observability tooling
  • CI/CD pipelines

Nice to have

  • Certificate Management
  • PKI
  • Authentication

What the JD emphasized

  • operational risk
  • reliability
  • scalability
  • customer trust
  • resilience
  • automation
  • observability
  • compliance
  • auditable
  • risk management
  • controls
  • regulatory requirements
Read full job description

Job Title:

Lead Site Reliability Engineer

Overview:

Overview The Mastercard Business Operations (BizOps) organization is seeking a Lead BizOps Engineer to serve as a technical authority and operational architect across critical platforms. This role is designed for a senior individual contributor who thrives at system‑level thinking, drives SRE, DevOps maturity at scale, and influences outcomes across programs, portfolios. As a Lead BizOps Engineer, you will operate beyond a single application or team, shaping reliability strategy, defining standards, and elevating operational excellence across Mastercard’s most business‑critical services. You will partner deeply with product engineering, architecture, security, and leadership to ensure platforms are designed, delivered, and operated with resilience, scalability, and customer trust at their core. If this describes you, you’ll feel at home here: You proactively design out operational risk rather than reacting to it. You influence without authority and lead through technical credibility and data. You see CI/CD, automation, observability, and reliability as foundational engineering disciplines, not tooling exercises. BizOps is at the forefront of Mastercard’s Operational Resilience evolution, driving modern tooling, standardized practices, and consistent operating models across the enterprise.

Mission BizOps acts as the production readiness and operational resilience steward for Mastercard platforms. As a Lead BizOps Engineer, your mission is to embed reliability, operability, and compliance into platform design and delivery, ensuring services are: Highly available, resilient, and performant Observable, self‑healing, and automation‑driven Secure, compliant, and auditable by design Operated through repeatable, scalable, low‑toil processes You will provide continuous feedback loops into engineering and product teams, ensuring lessons learned from production meaningfully improve future designs and customer experience.

What We Do in BizOps We deliver this mission through: Deep incident ownership with rigorous root‑cause analysis tied to business impact A shift‑left operational mindset, influencing architecture and design before code reaches production Enterprise‑grade risk management, controls, and compliance oversight Standardized and streamlined support models that reduce friction for partners Bridging product intent and operational reality to deliver reliable, customer‑centric platforms At the Lead level, you are expected to shape these practices, not just execute them.

Key Responsibilities Technical Leadership & Architecture Act as a Lead‑level technical authority for reliability, operability, and production readiness across multiple platforms or programs. Influence system architecture, design patterns, and platform standards to improve resiliency, scalability, and fault tolerance. Partner with engineering and architecture teams during pre‑production and roadmap phases to guide capacity planning, failure modeling, and launch readiness. Challenge designs constructively, advocating for operational simplicity, automation, and sustainable on‑call models. Operational Excellence & Reliability Own and evolve availability, latency, performance, and reliability objectives for critical systems. Lead complex production events and cross‑platform investigations, reducing MTTR through systemic fixes, not workarounds. Champion blameless postmortems, ensuring remediation actions translate into measurable reliability improvements. Identify recurring failure patterns and drive engineering‑led elimination of toil. DevOps, CI/CD & Automation Provide leadership for CI/CD strategy, ensuring pipelines support automated validation, risk‑based gating, and safe, repeatable deployments. Drive adoption of automation‑first practices across build, deploy, test, recovery, and compliance workflows. Influence DevOps standards across teams, enabling consistent, high‑quality software delivery at scale. Observability & Self‑Healing Systems Define and promote standards for monitoring, alerting, SLOs, and telemetry. Enable proactive detection, predictive alerting, and self‑healing capabilities across platforms. Ensure observability is treated as a first‑class architectural requirement, not an afterthought. Risk, Controls & Compliance Partner with security, risk, and compliance teams to embed controls, auditability, and regulatory requirements into platform design and operations. Ensure operational practices meet Mastercard’s enterprise risk and compliance expectations across all environments. Influence, Mentorship & Thought Leadership Mentor senior and junior engineers, raising the technical bar across the BizOps community. Contribute to guild initiatives, standards, whitepapers, and best‑practice guidance. Influence leaders and peers through data, experience, and clear technical narratives. Represent BizOps in cross‑organizational forums as a trusted advisor on reliability and operations.

Qualifications Required Bachelor’s degree in Computer Science, Engineering, or a related technical discipline, or equivalent practical experience. Deep expertise in distributed systems, reliability engineering, and production operations. Strong foundation in algorithms, data structures, system design, and automation. Advanced troubleshooting skills across the full technology stack. Proven ability to drive decisions and outcomes in high‑pressure, high‑impact environments. Possess a solid understanding of databases, blob-stores like S3, and load balancers. Proficiency in one or more of: Python, Go, Bash. Extensive hands‑on experience with DevOps and observability tooling, such as(or similar tools): • Git / Bitbucket • Jenkins / XLR • Chef, Ansible • Splunk • Dynatrace • Demonstrated success building and scaling CI/CD pipelines with minimal manual intervention.

Preferred / Deep Expertise Certificate Management, PKI, Authentication & Authorization LDAP, Active Directory Services, Access Provisioning Controls, Audit, and Compliance frameworks SOAP and REST APIs and integration patterns

All About You You are strategic yet hands‑on, able to zoom out to enterprise impact and zoom in to code or configs. You operate with calm authority under pressure. You challenge the status quo with respect and data. You value sustainability, clarity, and engineering excellence. You are motivated by impact, scale, and long‑term improvement, not just delivery.

To find US Salary Ranges, visit People Place. Under the Compensation tab, select "Salary Structures." Within the text of "Salary Structures," click on the link "salary structures 2025," through which you will be able to access the salary ranges for each Mastercard job family. For more information regarding US benefits, visit People Place and review the Benefits tab and the Time Off & Leave tab.