What you'd actually do

Foster developer ownership and empower teams to build resilient, fault‑tolerant, scalable products.

Support developers during the build phase with operational design, automation, capacity planning, and monitoring .

Establish and enforce operational standards while promoting an agile, learning‑focused culture.

Lead triage and root‑cause analysis with a focus on business impact and blameless post‑mortems.

Engage early in the development lifecycle to proactively manage production and change activities.

Skills

Required

BS in Computer Science or related technical field, or equivalent practical experience.
Curiosity and appetite for automation, new technologies, and scalable architectures.
Strong problem‑solving skills, communication abilities, ownership, and drive.
Interest in large‑scale distributed systems design, analysis, and troubleshooting.
Ability to work in diverse, matrix‑based, geographically distributed teams.
Balance between long‑term system health and short‑term fixes.
Ability to collaborate cross‑functionally with clear understanding of expected system behavior and monitoring needs.
Experience in industry standard CI/CD tools like Git/Bitbucket, Jenkins, Maven, Artifactory, and Chef.
Experience designing and implementing an effective and efficient CI/CD flow that gets code from dev to prod with high quality and minimal manual effort is desired.
Ability to work in shifts and weekends when in needed & based on team members rotations & schedule.

Nice to have

Experience with algorithms, data structures, scripting, pipeline management, and software design.
Experience working across development, operations, and product teams.
Prior SRE experience.
Expertise in RDBMS such as PostgreSQL and Oracle.
Proficiency in SQL, PL/SQL, and PostgreSQL features.
Strong understanding of database architecture, performance tuning, and query optimization.
Experience with monitoring tools (e.g., Splunk, Dynatrace).
Experience in production support and ITIL processes.
Experience with CI/CD tools: Git/Bitbucket, Jenkins, Maven, Artifactory, Groovy, Chef.
Understanding of: Client‑server relationships, Network concepts (Layer 1–3), Stack

What the JD emphasized

production readiness steward

platform stability, health, and resilience

operational design, automation, capacity planning, and monitoring

operational standards

agile, learning-focused culture

triage and root-cause analysis

business impact

blameless post-mortems

development lifecycle

production and change activities

risk management, compliance, and mitigation

product and customer priorities

operational needs

application CI/CD pipeline

DevOps automation

incident response

blameless post-mortems

holistic approach to problem solving

mean time to recover

global team

application health, performance, and capacity

system design consulting

capacity planning

launch reviews

monitoring and alerting strategies

zero-downtime deployments

ITSM practices

operational gaps

resiliency concerns

large-scale distributed systems design, analysis, and troubleshooting

system health

short-term fixes

cross-functionally

system behavior

monitoring needs

CI/CD tools

software design

production support

ITIL processes

database architecture

performance tuning

query optimization

monitoring tools

client-server relationships

network concepts

Job Title:

Service Management Reliability Engineer

Overview:

Who is Mastercard? At Mastercard Technology, we work to connect and power an inclusive, digital economy that benefits everyone, everywhere, by making transactions safe, simple, smart, and accessible. Using secure data and networks, partnerships, and passion, our innovations and solutions help individuals, financial institutions, governments, and businesses realize their greatest potential. Our decency quotient, or DQ, drives our culture and everything we do inside and outside of our company. We cultivate a culture of inclusion for all employees that respects their individual strengths, views, and experiences. We believe that our differences enable us to be a better team – one that makes better decisions, drives innovation, and delivers better business results.

Technology at Mastercard What we create today will define tomorrow. Revolutionary technologies that reshape the digital economy to be more connected and inclusive than ever before. Safer, faster, more sustainable. And we need the best people to do it. Technologists who are energized by the challenges of a truly global network. With the talent and vision to create the critical systems and products that power global commerce and connect people everywhere to the vital goods and services they need every day. Working at Mastercard means being part of a unique culture. Inclusive and diverse, a rich collaboration of ideas and perspectives. A place that celebrates your strengths, values your experiences and offers you the flexibility to shape a career across disciplines and continents. And the opportunity to work alongside experts and leaders at every level of the business, improving what exists, and inventing what’s next.

About the Role The Business Operations (Biz Ops) team is seeking a Service Management Reliability Engineer (SMRE). The role of the Business Operations Organization is to be the production readiness steward for Mastercard products. As a Biz Ops SMRE, we are responsible for assuring highly reliable service functionality by creating and maintaining service management strategies, tools, and service-level objectives to deliver reliable, zero-touch solutions for application and infrastructure products and services. We see the big picture and help create and enforce operations standards while facilitating an agile and learning culture. SMREs primarily support the development of documentation, analysis, and improvement of processes and strategies, enabling program teams to realize goals of proactive engagement in the development lifecycle, thought leadership within service management strategies, reliability of tools and service level objectives, and automated solutions. We help program teams adhere to regulatory, and risk compliance to policy and are often involved in federated roles such as change, incident, problem, or service request managers, communication leads, process leads, program managers, and/or others. Ultimately, the role of Business Operations is to align Product and Customer-focused priorities with Operational needs by providing continuous feedback throughout the lifecycle.

Role Qualifications The ideal candidate will have experience in many of these areas: • Bachelor’s degree in information systems, Information Technology, Computer Science, Engineering, or equivalent work experience. • Have a curiosity to ask the right questions to identify the root cause, solve upstream challenges, and act with a bias toward action with pervasive ownership over your domain, the problem space, and the mission before you. • SMREs daily employ critical thinking across problems they solve, the relationships they manage, and the space in which they support. • Risk awareness, of the risk(s) associated with the program(s) you support, across Biz Ops processes and practices, seeking to “do no harm” yet further, taking proactive, thoughtful risks in innovation while always ensuring alignment to Mastercard’s regulatory, risk-based requirements. • Requires capability to integrate theory and principles with organizational practices and precedents. • Demonstrates an intermediate knowledge of a program or set of services, including an understanding of the customer journey and primary business drivers for the program. • Provides guidance to less experienced team members on defined procedures and may supervise/coordinate work across individual contributors. • Appetite for change and pushing the boundaries of what can be done with automation. Be curious about new technology, infrastructure, and practices to scale our architecture and prepare for future growth. • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive. • Interest in working with, assessing data of, and understanding common challenges associated with large-scale distributed systems. • Willingness and ability to learn and take on challenging opportunities and to work as a member of matrix matrix-based diverse and geographically distributed project team. • Ability to balance doing the right thing whiles right with fixing things quickly. Flexible and pragmatic, while working towards improving the long-term health of the system. • Comfortable collaborating with cross-functional teams to ensure Biz Ops processes, procedures, and levels of maturity, in support of production resilience.

Great to Have / Preferred Knowledge and Experience • Experience in a SRE role or related field. • Experience in Monitoring tools such as Splunk, and Dynatrace. • Possess a basic understanding of the five ITIL practices: Service Strategy, Service Design, Continual Service Improvement, Service Operations, and Service Transition, and apply these practices to enhance service quality, efficiency, and alignment with business needs. • Understand and ensure the availability, security, capacity, and continuity requirements of an IT team. • Demonstrate the ability to enable cross-department collaboration to bring IT Teams and Development Teams together through streamlined product management approaches. • Plan and implement solutions to seamlessly deliver IT services while minimizing disruptions. • Demonstrate working knowledge of key ITSM practices which include Incident Management, Change Management, Problem Management, and Service Request Management. • Work to enforce change management processes and standards while simultaneously working to improve the process. • Knowledge of Service Request Management processes which are facilitated through work order tickets and are subject to accepted and documented request fulfillment process guidelines as defined by ITIL as well as potential unique handling procedures applied by various support groups. Further, create, implement, and govern the accepted Service Request Management processes. • Manage ITSM practices, IT Asset Management (ITAM), and Knowledge Management effectively using the ticketing system. • Streamline product management approaches to enhance communication and alignment. • Maintain the quality of incident and problem tickets by effectively reviewing, updating, and managing data fidelity within the tickets, including essential KPI data that reflects service resiliency and redundancy. • Partner with the incident process engineers and establish an ongoing dialogue around incident process requirements and incident data fidelity.

To find US Salary Ranges, visit People Place. Under the Compensation tab, select "Salary Structures." Within the text of "Salary Structures," click on the link "salary structures 2025," through which you will be able to access the salary ranges for each Mastercard job family. For more information regarding US benefits, visit People Place and review the Benefits tab and the Time Off & Leave tab.

Job Title:

Service Management Reliability Engineer

Overview: