Business Operations Site Reliability Engineer

Mastercard Mastercard · Fintech · Mexico City, Mexico · Engineering

Mastercard is seeking a Business Operations Site Reliability Engineer to ensure the stability and health of their platform, focusing on production readiness, developer ownership, and operational excellence. The role involves engaging in the full lifecycle of services, from design to refinement, with a strong emphasis on automation, capacity planning, monitoring, and incident response. Key responsibilities include shifting left to proactively manage production, mitigating risks, and aligning operational needs with product and customer priorities. The ideal candidate will have experience in security and/or enterprise monitoring, strong UNIX/Linux and DevOps skills, and a systematic problem-solving approach, preferably with experience in the banking/payment industry.

What you'd actually do

  1. Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation, and refinement.
  2. Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns
  3. Support services before they go live through activities such as operational design consulting, capacity planning and launch reviews.
  4. Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
  5. Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.

Skills

Required

  • Experience, within Security and/or Enterprise Monitoring Context is required
  • 6–10 years of hands-on experience in UNIX/Linux systems, scripting and automation, Oracle and SQL databases, DevOps practices, and CI/CD pipelines.
  • Strong knowledge of operating systems, platforms, and infrastructure components.
  • Experience working through others to solve complex business problems and effect change.
  • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
  • Ability to challenge current practices to promote efficiencies and deliver positive results.
  • We support many different stakeholders. Experience in dealing with difficult situations and making decisions with a sense of urgency is needed.
  • Interest in designing, analyzing and troubleshooting large-scale distributed systems.
  • Strong project management skills and success in managing large-scale cross-functional teams
  • Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must.
  • English/ Spanish verbal and written is a must.

Nice to have

  • Experience in one or more of the following is preferred: C, C++, Java, Python, Go, Perl or Ruby.
  • Experience in banking/ payment/ finance industry is preferred, especially in the Mexican market
  • The knowledge in cloud platforms, preferably AWS, preferred.

What the JD emphasized

  • production readiness steward
  • developer run ownership
  • operational design
  • automation
  • capacity planning
  • monitoring
  • fault-tolerant
  • scalable products
  • agile and learning culture
  • triage
  • root cause
  • business impact
  • shift left
  • proactive management
  • risk management
  • compliance
  • streamlining
  • standardizing
  • centralizing points of interaction
  • stakeholder communication
  • Product and Customer Focused priorities
  • Operational needs
  • run state
  • customer experience
  • lifecycle of services
  • deployment
  • operation
  • refinement
  • ITSM activities
  • operational gaps
  • resiliency concerns
  • launch reviews
  • availability
  • latency
  • system health
  • scale systems sustainably
  • automation
  • reliability
  • velocity
  • CI/CD pipeline
  • DevOps automation
  • best practices
  • incident response
  • blameless postmortems
  • holistic approach
  • technology stack
  • optimize mean time to recover
  • global team
  • mentor junior resources
  • Security
  • Enterprise Monitoring Context
  • UNIX/Linux systems
  • scripting
  • automation
  • Oracle and SQL databases
  • DevOps practices
  • CI/CD pipelines
  • operating systems
  • platforms
  • infrastructure components
  • complex business problems
  • effect change
  • Systematic problem-solving approach
  • communication skills
  • ownership and drive
  • challenge current practices
  • efficiencies
  • positive results
  • difficult situations
  • making decisions
  • sense of urgency
  • large-scale distributed systems
  • project management skills
  • managing large-scale cross-functional teams
  • development, operations, and product teams
  • prioritize needs
  • build relationships
  • banking/ payment/ finance industry
  • Mexican market
  • cloud platforms
  • AWS
  • English/ Spanish
  • site reliability engineers
  • appetite for change
  • push the boundaries
  • automation
  • managing service levels
  • critical security services
  • information security
  • security policies and practices
  • confidentiality and integrity of the information
  • suspected information security incidents