Business Operations Site Reliability Engineer

Mastercard · Fintech · Mexico City, Mexico · Engineering

Mastercard is seeking a Business Operations Site Reliability Engineer to ensure the stability and health of their platform, focusing on production readiness, developer ownership, and operational excellence. The role involves engaging in the full lifecycle of services, from design to refinement, with a strong emphasis on automation, capacity planning, monitoring, and incident response. Key responsibilities include shifting left to proactively manage production, mitigating risks, and aligning operational needs with product and customer priorities. The ideal candidate will have experience in security and/or enterprise monitoring, strong UNIX/Linux and DevOps skills, and a systematic problem-solving approach, preferably with experience in the banking/payment industry.

What you'd actually do

Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation, and refinement.
Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns
Support services before they go live through activities such as operational design consulting, capacity planning and launch reviews.
Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.

Skills

Required

Experience, within Security and/or Enterprise Monitoring Context is required
6–10 years of hands-on experience in UNIX/Linux systems, scripting and automation, Oracle and SQL databases, DevOps practices, and CI/CD pipelines.
Strong knowledge of operating systems, platforms, and infrastructure components.
Experience working through others to solve complex business problems and effect change.
Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
Ability to challenge current practices to promote efficiencies and deliver positive results.
We support many different stakeholders. Experience in dealing with difficult situations and making decisions with a sense of urgency is needed.
Interest in designing, analyzing and troubleshooting large-scale distributed systems.
Strong project management skills and success in managing large-scale cross-functional teams
Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must.
English/ Spanish verbal and written is a must.

Nice to have

Experience in one or more of the following is preferred: C, C++, Java, Python, Go, Perl or Ruby.
Experience in banking/ payment/ finance industry is preferred, especially in the Mexican market
The knowledge in cloud platforms, preferably AWS, preferred.

What the JD emphasized

production readiness steward
developer run ownership
operational design
automation
capacity planning
monitoring
fault-tolerant
scalable products
agile and learning culture
triage
root cause
business impact
shift left
proactive management
risk management
compliance
streamlining
standardizing
centralizing points of interaction
stakeholder communication
Product and Customer Focused priorities
Operational needs
run state
customer experience
lifecycle of services
deployment
operation
refinement
ITSM activities
operational gaps
resiliency concerns
launch reviews
availability
latency
system health
scale systems sustainably
automation
reliability
velocity
CI/CD pipeline
DevOps automation
best practices
incident response
blameless postmortems
holistic approach
technology stack
optimize mean time to recover
global team
mentor junior resources
Security
Enterprise Monitoring Context
UNIX/Linux systems
scripting
automation
Oracle and SQL databases
DevOps practices
CI/CD pipelines
operating systems
platforms
infrastructure components
complex business problems
effect change
Systematic problem-solving approach
communication skills
ownership and drive
challenge current practices
efficiencies
positive results
difficult situations
making decisions
sense of urgency
large-scale distributed systems
project management skills
managing large-scale cross-functional teams
development, operations, and product teams
prioritize needs
build relationships
banking/ payment/ finance industry
Mexican market
cloud platforms
AWS
English/ Spanish
site reliability engineers
appetite for change
push the boundaries
automation
managing service levels
critical security services
information security
security policies and practices
confidentiality and integrity of the information
suspected information security incidents

Read full job description

Our Purpose

Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we’re helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential.

Title and Summary

Business Operations Site Reliability Engineer

Overview: The role of Business Operations Organization is to be the production readiness steward for Mastercard products. As a Business Operations SRE, we are responsible for ensuring that our platform is stable and healthy. We break down barriers to run our products by fostering developer run ownership and empowering developers to build resilient products. We support our developers during the application build phase in software run principals that includes operational design, automation, capacity planning, monitoring that leads to fault-tolerant, scalable products. We see the big picture and help create and enforce operations standards while facilitating an agile and learning culture.

We accomplish this transformation through supporting daily operations with a hyper focus on triage and then root cause by understanding the business impact of our products. The goal of every biz ops team is to shift left to be more proactive and upfront in the development process, and to proactively manage production and change activities to maximize customer experience and increase the overall value of supported applications. Biz Ops teams also focus on risk management by tying all our activities together with an overarching responsibility for compliance and risk mitigation across all our environments. A biz ops focus is also on streamlining and standardizing traditional application specific support activities and centralizing points of interaction for both internal and external partners by communicating effectively with all key stakeholders.

Ultimately, the role of biz ops is to align Product and Customer Focused priorities with Operational needs. We regularly review our run state not only from an internal perspective but also understanding and providing the feedback loop to our development partners on how we can improve the customer experience of our applications.

Key Responsibilities

• Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation, and refinement. • Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns • Support services before they go live through activities such as operational design consulting, capacity planning and launch reviews. • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity. • Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead Mastercard in DevOps automation and best practices. • Practice sustainable incident response and blameless postmortems. • Take a holistic approach to problem solving, by connecting the dots during a production event thru the various technology stack that makes up the platform, to optimize mean time to recover • Work with a global team spread across tech hubs in multiple geographies and time zones • Share knowledge and mentor junior resources

All about you

• Bachelor’s degree in computer science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical • Experience, within Security and/or Enterprise Monitoring Context is required • 6–10 years of hands-on experience in UNIX/Linux systems, scripting and automation, Oracle and SQL databases, DevOps practices, and CI/CD pipelines. • Experience in one or more of the following is preferred: C, C++, Java, Python, Go, Perl or Ruby. • Strong knowledge of operating systems, platforms, and infrastructure components. • Experience working through others to solve complex business problems and effect change. • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive. • Ability to challenge current practices to promote efficiencies and deliver positive results. • We support many different stakeholders. Experience in dealing with difficult situations and making decisions with a sense of urgency is needed. • Interest in designing, analyzing and troubleshooting large-scale distributed systems. •Strong project management skills and success in managing large-scale cross-functional teams • Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must. • Experience in banking/ payment/ finance industry is preferred, especially in the Mexican market • The knowledge in cloud platforms, preferably AWS, preferred. • English/ Spanish verbal and written is a must.

We are seeking site reliability engineers with an appetite for change and who can push the boundaries of what can be completed through automation, while managing service levels for some of Mastercard’s most critical security services.

Corporate Security Responsibility

All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must:

Abide by Mastercard’s security policies and practices;
Ensure the confidentiality and integrity of the information being accessed;
Report any suspected information security violation or breach, and
Complete all periodic mandatory security trainings in accordance with Mastercard’s guidelines.

Our Purpose

Title and Summary

Business Operations Site Reliability Engineer

Key Responsibilities

All about you

Corporate Security Responsibility

Abide by Mastercard’s security policies and practices;
Ensure the confidentiality and integrity of the information being accessed;
Report any suspected information security violation or breach, and
Complete all periodic mandatory security trainings in accordance with Mastercard’s guidelines.