What you'd actually do

Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation, and refinement.

Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns.

Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.

Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.

Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.

Skills

Required

UNIX/Linux systems
scripting and automation
DevOps practices
CI/CD pipelines
operating systems
platforms
infrastructure components
ITSM processes (Change and Problem Management)
observability and monitoring tools (Splunk, Dynatrace)
analytical skills
problem-solving skills
planning skills
communication skills
ability to work independently
collaboration skills
relationship-building skills
customer service skills

Nice to have

C
C++
Java
Python
Go
Perl
Ruby
Knowledge of Artificial Intelligence Use cases and Implementation

What the JD emphasized

production readiness owner

reliability

scalability

performance

availability

capacity

performance

observability

self-healing

deployment automation

operational excellence

production event

mean time to recover

production readiness

operational gaps

resiliency concerns

system design consulting

capacity planning

launch reviews

system health

scale systems sustainably

evolve systems

reliability

velocity

incident response

blameless postmortems

holistic approach

connecting the dots

technology stack

optimize mean time to recover

global team

tech hubs

multiple geographies

time zones

share knowledge

mentor junior resources

hands-on experience

UNIX/Linux systems

scripting and automation

DevOps practices

CI/CD pipelines

operating systems

platforms

infrastructure components

ITSM processes

Change and Problem Management

observability and monitoring tools

analytical

problem-solving

planning skills

manage multiple priorities

work effectively under pressure

communication skills

work independently

minimal supervision

collaboration

relationship-building

customer service skills

Our Purpose

Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we’re helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential.

Title and Summary

Senior Site Reliability Engineer

The Xborder team is looking for a Senior Site Reliability Engineer who can help us solve problems, implement automation, and leverage best practices. · Are you a born problem solver who loves to figure out how something works? · Are you a detail -oriented individual who enjoys complex problem solving? · Do you love determining the correct actions required to fix a problem? · Do you have a low tolerance for manual work and look to automate everything you can?

Overview

Business Operations is leading the Site Reliability Engineering (SRE) transformation at Mastercard through our tooling and by being an advocate for change & standards throughout the development, quality, release, and product organizations. We need team members with an appetite for change and pushing the boundaries of what can be done with automation. Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must. The Senior Site Reliability Engineer (SRE) acts as the production readiness owner for the platform, ensuring that systems are designed, built, and operated with reliability, scalability, and performance at their core. This role partners closely with engineering teams across the software lifecycle—from design and development to deployment and production—embedding SRE principles and operational excellence into every stage of delivery. The SRE ensures that key operational capabilities such as availability, capacity, performance, observability, self-healing, and deployment automation are proactively integrated into systems.

Key Responsibilities • Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation, and refinement. • Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns. • Support services before they go live through activities such as system design consulting, capacity planning and launch reviews. • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health. • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity. • Practice sustainable incident response and blameless postmortems. • Take a holistic approach to problem solving, by connecting the dots during a production event thru the various technology stack that makes up the platform, to optimize mean time to recover. • Work with a global team spread across tech hubs in multiple geographies and time zones. • Share knowledge and mentor junior resources.

All about you • Bachelor’s degree in computer science, Information Technology, or a related technical field (e.g., Engineering, Physics, Mathematics), or equivalent practical experience. • 6–10 years of hands-on experience in UNIX/Linux systems, scripting and automation, Oracle and SQL databases, DevOps practices, and CI/CD pipelines. • Experience in one or more of the following is preferred: C, C++, Java, Python, Go, Perl or Ruby. • Strong knowledge of operating systems, platforms, and infrastructure components. • Knowledge of Artificial Intelligence Use cases and Implementation. • Solid understanding of ITSM processes (Change and Problem Management). • Experience with observability and monitoring tools such as Splunk and Dynatrace. • Strong analytical, problem-solving, and planning skills. • Ability to manage multiple priorities and work effectively under pressure. • Strong communication skills (both written and verbal). • Proven ability to work independently with minimal supervision. • Strong collaboration, relationship-building, and customer service skills.

Corporate Security Responsibility

All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must:

Abide by Mastercard’s security policies and practices;
Ensure the confidentiality and integrity of the information being accessed;
Report any suspected information security violation or breach, and
Complete all periodic mandatory security trainings in accordance with Mastercard’s guidelines.

Our Purpose

Title and Summary

Senior Site Reliability Engineer

Overview

Corporate Security Responsibility

Abide by Mastercard’s security policies and practices;
Ensure the confidentiality and integrity of the information being accessed;
Report any suspected information security violation or breach, and
Complete all periodic mandatory security trainings in accordance with Mastercard’s guidelines.