Senior Site Reliability Engineer (sre &… at Affirm

Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest.

Site Reliability Engineering at Affirm is a small, yet crucial, team that helps our Engineering partners to “Operate What They Own” with excellence to protect their customers’ experience. SRE accomplishes this through defining frameworks and best practices for operating applications, building tooling, and providing training and consulting. Some of the many SRE responsibilities are:

Providing data and visibility to teams and leadership on application performance
Guiding the development of SLOs
Driving the Incident Management and Analysis process
Steering the implementation of Change Management and Deployment practices
Engaging in service and architectural conversations
Recommending observability and alerting configurations

The SRE team benefits from experience across many domains including:

infrastructure, platform, and distributed systems
capacity management, load and chaos testing
automation, observability, and configuration management
development and product experience

The SRE team is seeking motivated software and systems engineers with the experience to build, iterate on, and expand incident lifecycle, reliability, and resilience practices throughout Affirms Engineering organization and beyond.

What You'll Do:

You will be responsible for owning and delivering quarterly goals for your team, leading engineers on your team through ambiguity to solve open-ended problems, and ensuring that everyone is supported throughout delivery.
You will support your peers and stakeholders in the product development lifecycle by collaborating with infrastructure, product management, developer experience & analytics by participating in ideation, articulating technical constraints, and partnering on decisions that properly consider risks and trade-offs.
You will proactively identify technical solutions and operational processes that strengthen incident readiness, response, and post-incident analysis.
You will support the operations and availability of your team’s artifacts by creating and monitoring metrics, escalating when needed, and supporting “keep the lights on” & on-call efforts.
You will foster a culture of quality and ownership on your team by setting or improving code review and design standards for your team, and advocating for them beyond your team through your writing and tech talks.
You will help develop talent on your team by providing feedback and guidance, and leading by example.
On-Call Rotation - There would be an on-call rotation for this role as a requirement.

What We Look For:

You have 4+ years of experience designing, developing and launching backend systems at scale using scripting and development languages like Bash, Python or Kotlin.
You have a track record of developing highly available distributed systems using technologies like AWS, MySQL and Kubernetes.
You have meaningful experience contributing in or driving parts of the Incident Lifecycle process, enabling actionable insights that improve the quality culture, reliability, resilience, and system performance.
You have 4+ years working in a Site Reliability or Production Engineering team
You demonstrate curiosity with empathy, and strong opinions loosely held
You have experience defining a technical plan for the delivery of a significant feature or system component with an elegant, simple and extensible design. You write high quality code that is easily understood and used by others.
You have experience in making impactful changes in a large code base, and have developed a suite of tools and practices that enable you and your team to do so safely.
Your experience demonstrates that you take ownership of your growth, proactively seeking feedback from your team, your manager, and your stakeholders.
You have strong verbal and written communication skills that support effective collaboration with our global engineering team.

Compensation & Benefits

Base Pay Grade - N

Equity Grade - 4

Employees new to Affirm typically come in at the start of the pay range. Affirm focuses on providing a simple and transparent pay structure which is based on a variety of factors, including location, experience and job-related skills.

Base pay is part of a total compensation package that may include monthly stipends for health, wellness and tech spending, and benefits (including 100% subsidized medical coverage, dental and vision for you and your dependents). In addition, the employees may be eligible for equity rewards offered by Affirm Holdings, Inc. (parent company).

ESP base pay range per year: **€85,000 - €115,000 **

Additional benefits include:

Flexible Spending Wallets for tech, food and lifestyle
Away Days - wellness days to take off work and recharge
Learning & Development programs
Parental benefit
Employee Resource & Community Groups

Location - Remote Spain

We are able to offer visa sponsorship for this role, **but do require that someone is based in Spain for the role.

#LI-Remote**

Affirm is proud to be a remote-first company! The majority of our roles are remote and you can work almost anywhere within the country of employment. Affirmers in proximal roles have the flexibility to work remotely, but will occasionally be required to work out of their assigned Affirm office. A limited number of roles remain office-based due to the nature of their job responsibilities.

We’re extremely proud to offer competitive benefits that are anchored to our core value of people come first. Some key highlights of our benefits package include:

Health care coverage - Affirm covers all premiums for all levels of coverage for you and your dependents
Flexible Spending Wallets - generous stipends for spending on Technology, Food, various Lifestyle needs, and family forming expenses
Time off - competitive vacation and holiday schedules allowing you to take time off to rest and recharge
ESPP - An employee stock purchase plan enabling you to buy shares of Affirm at a discount

We believe It’s On Us to provide an inclusive interview experience for all, including people with disabilities. We are happy to provide reasonable accommodations to candidates in need of individualized support during the hiring process.

[For U.S. positions that could be performed in Los Angeles or San Francisco] Pursuant to the San Francisco Fair Chance Ordinance and Los Angeles Fair Chance Initiative for Hiring Ordinance, Affirm will consider for employment qualified applicants with arrest and conviction records.

By clicking "Submit Application," you acknowledge that you have read Affirm's Global Candidate Privacy Notice and hereby freely and unambiguously give informed consent to the collection, processing, use, and storage of your personal information as described therein.

Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest.

Providing data and visibility to teams and leadership on application performance
Guiding the development of SLOs
Driving the Incident Management and Analysis process
Steering the implementation of Change Management and Deployment practices
Engaging in service and architectural conversations
Recommending observability and alerting configurations

The SRE team benefits from experience across many domains including:

infrastructure, platform, and distributed systems
capacity management, load and chaos testing
automation, observability, and configuration management
development and product experience

What You'll Do:

You will be responsible for owning and delivering quarterly goals for your team, leading engineers on your team through ambiguity to solve open-ended problems, and ensuring that everyone is supported throughout delivery.
You will support your peers and stakeholders in the product development lifecycle by collaborating with infrastructure, product management, developer experience & analytics by participating in ideation, articulating technical constraints, and partnering on decisions that properly consider risks and trade-offs.
You will proactively identify technical solutions and operational processes that strengthen incident readiness, response, and post-incident analysis.
You will support the operations and availability of your team’s artifacts by creating and monitoring metrics, escalating when needed, and supporting “keep the lights on” & on-call efforts.
You will foster a culture of quality and ownership on your team by setting or improving code review and design standards for your team, and advocating for them beyond your team through your writing and tech talks.
You will help develop talent on your team by providing feedback and guidance, and leading by example.
On-Call Rotation - There would be an on-call rotation for this role as a requirement.

What We Look For:

You have 4+ years of experience designing, developing and launching backend systems at scale using scripting and development languages like Bash, Python or Kotlin.
You have a track record of developing highly available distributed systems using technologies like AWS, MySQL and Kubernetes.
You have meaningful experience contributing in or driving parts of the Incident Lifecycle process, enabling actionable insights that improve the quality culture, reliability, resilience, and system performance.
You have 4+ years working in a Site Reliability or Production Engineering team
You demonstrate curiosity with empathy, and strong opinions loosely held
You have experience defining a technical plan for the delivery of a significant feature or system component with an elegant, simple and extensible design. You write high quality code that is easily understood and used by others.
You have experience in making impactful changes in a large code base, and have developed a suite of tools and practices that enable you and your team to do so safely.
Your experience demonstrates that you take ownership of your growth, proactively seeking feedback from your team, your manager, and your stakeholders.
You have strong verbal and written communication skills that support effective collaboration with our global engineering team.

Compensation & Benefits

Base Pay Grade - N

Equity Grade - 4

ESP base pay range per year: **€85,000 - €115,000 **

Additional benefits include:

Flexible Spending Wallets for tech, food and lifestyle
Away Days - wellness days to take off work and recharge
Learning & Development programs
Parental benefit
Employee Resource & Community Groups

Location - Remote Spain

We are able to offer visa sponsorship for this role, **but do require that someone is based in Spain for the role.

#LI-Remote**

We’re extremely proud to offer competitive benefits that are anchored to our core value of people come first. Some key highlights of our benefits package include:

Health care coverage - Affirm covers all premiums for all levels of coverage for you and your dependents
Flexible Spending Wallets - generous stipends for spending on Technology, Food, various Lifestyle needs, and family forming expenses
Time off - competitive vacation and holiday schedules allowing you to take time off to rest and recharge
ESPP - An employee stock purchase plan enabling you to buy shares of Affirm at a discount

Senior Site Reliability Engineer (sre & Platform Reliability)

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Compensation & Benefits

Compensation & Benefits