Service Management Reliability Engineer

Mastercard Mastercard · Fintech · O Fallon, MO +1 · Engineering

Mastercard is seeking a Senior Bizops Engineer-1 to serve as a Business Operations Site Reliability Engineer (SRE) / Operational Readiness Architect. The role focuses on ensuring platform stability, health, and resilience by fostering developer ownership, supporting operational design, automation, capacity planning, and monitoring. Key responsibilities include establishing operational standards, leading triage and root-cause analysis with a focus on business impact, engaging early in the development lifecycle, driving risk management and compliance, and supporting the application CI/CD pipeline and DevOps automation. The role also involves aligning product priorities with operational needs, optimizing mean time to recover, and working with a global team.

What you'd actually do

  1. Foster developer ownership and empower teams to build resilient, fault‑tolerant, scalable products.
  2. Support developers during the build phase with operational design, automation, capacity planning, and monitoring .
  3. Establish and enforce operational standards while promoting an agile, learning‑focused culture.
  4. Lead triage and root‑cause analysis with a focus on business impact and blameless post‑mortems.
  5. Engage early in the development lifecycle to proactively manage production and change activities.

Skills

Required

  • BS in Computer Science or related technical field, or equivalent practical experience.
  • Curiosity and appetite for automation, new technologies, and scalable architectures.
  • Strong problem‑solving skills, communication abilities, ownership, and drive.
  • Interest in large‑scale distributed systems design, analysis, and troubleshooting.
  • Ability to work in diverse, matrix‑based, geographically distributed teams.
  • Balance between long‑term system health and short‑term fixes.
  • Ability to collaborate cross‑functionally with clear understanding of expected system behavior and monitoring needs.
  • Experience in industry standard CI/CD tools like Git/Bitbucket, Jenkins, Maven, Artifactory, and Chef.
  • Experience designing and implementing an effective and efficient CI/CD flow that gets code from dev to prod with high quality and minimal manual effort is desired.
  • Ability to work in shifts and weekends when in needed & based on team members rotations & schedule.

Nice to have

  • Experience with algorithms, data structures, scripting, pipeline management, and software design.
  • Experience working across development, operations, and product teams.
  • Prior SRE experience.
  • Expertise in RDBMS such as PostgreSQL and Oracle.
  • Proficiency in SQL, PL/SQL, and PostgreSQL features.
  • Strong understanding of database architecture, performance tuning, and query optimization.
  • Experience with monitoring tools (e.g., Splunk, Dynatrace).
  • Experience in production support and ITIL processes.
  • Experience with CI/CD tools: Git/Bitbucket, Jenkins, Maven, Artifactory, Groovy, Chef.
  • Understanding of: Client‑server relationships, Network concepts (Layer 1–3), Stack

What the JD emphasized

  • production readiness steward
  • platform stability, health, and resilience
  • operational design, automation, capacity planning, and monitoring
  • operational standards
  • agile, learning-focused culture
  • triage and root-cause analysis
  • business impact
  • blameless post-mortems
  • development lifecycle
  • production and change activities
  • risk management, compliance, and mitigation
  • product and customer priorities
  • operational needs
  • application CI/CD pipeline
  • DevOps automation
  • incident response
  • blameless post-mortems
  • holistic approach to problem solving
  • mean time to recover
  • global team
  • application health, performance, and capacity
  • system design consulting
  • capacity planning
  • launch reviews
  • monitoring and alerting strategies
  • zero-downtime deployments
  • ITSM practices
  • operational gaps
  • resiliency concerns
  • large-scale distributed systems design, analysis, and troubleshooting
  • system health
  • short-term fixes
  • cross-functionally
  • system behavior
  • monitoring needs
  • CI/CD tools
  • software design
  • production support
  • ITIL processes
  • database architecture
  • performance tuning
  • query optimization
  • monitoring tools
  • client-server relationships
  • network concepts