Principal Site Reliability Engineer at Upstart

What you'd actually do

Lead the definition, advocacy, and adoption of SRE principles across engineering teams

Partner with leadership to shape long-term reliability, resiliency, and observability strategies

Champion distributed tracing, real user monitoring (RUM), and key performance metrics such as Largest Contentful Paint (LCP) to improve system visibility and user experience

Build and scale self-healing systems to minimize manual intervention and reduce downtime

Drive enterprise-wide improvements to incident response processes, including those related to Machine Learning systems

Skills

Required

Python
Go
JavaScript/TypeScript
Infrastructure as Code (Terraform, CDK, CloudFormation, etc.)
observability
distributed tracing
RUM
LCP
performance monitoring tools (e.g., Datadog, Prometheus)
on-call and incident management
automation
building self-healing systems
LLM/GenAI to improve SRE efficiency and processes
program management skills

Nice to have

service mesh
Full stack development skills
building or extending observability platforms
Development Productivity or Quality Platforms
high-scale SaaS
microservice-oriented cloud environments

About Upstart

At Upstart, we’re united by a mission that matters: to radically reduce the cost and complexity of borrowing for all Americans. Every day, we bring creativity, experimentation, and advanced AI to reshape access to credit, helping millions move forward financially with clarity and confidence.

As the leading AI lending marketplace, we partner with banks and credit unions to expand access to affordable credit through technology that’s both radically intelligent and deeply human. Our platform runs over one million predictions per borrower using more than 1,800 signals, powering smarter, fairer decisions for millions of customers. But the numbers only hint at the impact. Every idea, every voice, and every contribution moves us closer to a world where credit never stands between people and their financial progress.

We’re proudly digital-first, giving most Upstarters the flexibility to do their best work from wherever they thrive, alongside teammates across 80+ cities in the US and Canada. Digital-first doesn’t mean distant. We’re intentional about in-person connection through team onsites, planning sessions, and moments that spark creativity and trust. And whether you choose to work primarily from home or collaborate in-person from one of our offices in Columbus, Austin, the Bay Area, or New York City (opening Summer 2026), you’ll have the support to work in the way that works best for you.

If you’re energized by tackling meaningful problems, excited to innovate with purpose, and motivated by work that truly matters, we’d love to hear from you.

**The Team: **

Upstart’s Site Reliability Engineering (SRE) team owns the reliability, resiliency, and observability of Upstart’s production systems. We build automation, tooling, and frameworks to ensure our infrastructure is healthy, scalable, and able to support a seamless experience for both engineers and customers. Our scope includes defining Upstart’s technology operations risk strategy, implementing disaster recovery planning, and setting company-wide reliability standards.

As a **Principal Engineer **on the **SRE team **at Upstart, you will serve as a thought leader and SRE evangelist - driving adoption of best practices, mentoring engineers across the organization, and influencing both technical and business decisions. Your impact will extend beyond SRE into cross-functional collaboration with Product Engineering, DevEx, Development Productivity (Quality), DevOps, Data Engineering, and Machine Learning teams to elevate operational excellence across the company.

How you’ll make an impact

Lead the definition, advocacy, and adoption of SRE principles across engineering teams
Partner with leadership to shape long-term reliability, resiliency, and observability strategies
Champion distributed tracing, real user monitoring (RUM), and key performance metrics such as Largest Contentful Paint (LCP) to improve system visibility and user experience
Build and scale self-healing systems to minimize manual intervention and reduce downtime
Drive enterprise-wide improvements to incident response processes, including those related to Machine Learning systems
Collaborate closely with Development Productivity and Quality teams to improve engineering velocity without sacrificing reliability
Influence technical and operational roadmaps through data-driven insights and hands-on technical contributions
Own and deliver cross-functional initiatives from concept through execution, applying program management skills to align stakeholders and achieve results

**Minimum Qualifications **

Bachelor’s degree in Computer Science, Engineering, or Mathematics, or a related field (or its equivalent) + 8 years of experience
Combined experience with both Software Engineering and Site Reliability Engineering, with a balanced background in both disciplines
Proven track record as an SRE thought leader and evangelist, driving adoption of reliability best practices across organizations
Strong communication and mentoring skills to influence engineers across disciplines
Proficiency in Python, Go, JavaScript/TypeScript
Proficiency with Infrastructure as Code (Terraform, CDK, CloudFormation, etc.)
Experience building internal tooling from scratch in agile development environments
Expertise with observability, distributed tracing, RUM, LCP, and performance monitoring tools (e.g., Datadog, Prometheus)
Experience with on-call and incident management, including large-scale or ML-related incidents
Strong background in automation and building self-healing systems
Hands-on experience with LLM/GenAI to improve SRE efficiency and processes
Program management skills, including the ability to propose innovative solutions, influence leadership, improve processes, and drive cross-functional projects to completion

Preferred Qualifications

Experience with service mesh
Full stack development skills
Experience building or extending observability platforms
Background in Development Productivity or Quality Platforms
Experience in high-scale SaaS, microservice-oriented cloud environments

Position location This role is available in the following locations: Remote-US, Remote-Canada

Time zone requirements The team operates on the East/West coast time zones.

Travel requirements As a digital first company, the majority of your work can be accomplished remotely. The majority of our employees can live and work anywhere in the U.S but are encouraged to to still spend high quality time in-person collaborating via regular onsites. The in-person sessions’ cadence varies depending on the team and role; most teams meet once or twice per quarter for 2-4 consecutive days at a time.

#LI-REMOTE

#LI-MidSenior

At Upstart, your base pay is one part of your total compensation package. The anticipated base salary for this position is expected to be within the below range. Your actual base pay will depend on your geographic location–with our “digital first” philosophy, Upstart uses compensation regions that vary depending on location. Individual pay is also determined by job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

In addition, Upstart provides employees with target bonuses, equity compensation, and generous benefits packages (including medical, dental, vision, and 401k).

United States | Remote - Anticipated Base Salary Range

$195,300—$270,400 USD

In addition, Upstart provides employees with target bonuses, equity compensation, and generous benefits packages (including medical, dental, vision, and 401k).

Canada | Remote - Anticipated Base Salary Range

$182,800—$230,000 CAD

What you'll love

At Upstart, our benefits are designed to support your health, financial well-being, family, and personal growth. Here’s what you can expect:

Competitive compensation, including base pay, bonus opportunities, and annual equity grants that vest quarterly
Retirement benefits to help you plan for the future, including a 401(k) or Group Retirement Savings Plan with a company match of $2 for every $1 contributed, up to $15,000 annually (USD in the US, CAD in Canada)
Employee Stock Purchase Plan (ESPP) with discounted stock purchase options for eligible employees (US only)
Comprehensive health coverage designed to support you and your family, including medical, dental, vision, and wellness resources for US and supplemental health coverage for Canada.
Health Savings Account contributions from Upstart for eligible plans (US only)
Income protection benefits, including life insurance and disability coverage for added financial security
Paid time off, sick leave, and company holidays, in line with local requirements
Paid family and parental leave to support caregiving and major life moments (duration varies by country)
Family-centered benefits to support fertility, parenthood, and caregiving needs
Employee Assistance Program (EAP) offering mental health support and life-centered resources
Financial wellness resources, including access to financial planning tools and a financial concierge service (US Only)
Annual wellness allowance to support your physical and emotional well-being and personal development, based on what matters most to you
Annual productivity allowance to invest in relevant tools and resources you need to do your best work, no matter where you work from
Connection and community through team events, all-company updates, and employee resource groups (ERGs)
Onsite perks, including catered lunches and fully stocked micro-kitchens when working from one of our offices in the Bay Area, Austin, Columbus, and New York City (opening Summer 2026!)

For roles based in Canada, please note that we are not currently able to hire in Quebec.

Upstart is a proud Equal Opportunity Employer. Just as we are dedicated to improving access to affordable credit for all, we are committed to inclusive and fair hiring practices.

If you require reasonable accommodation in completing an application, interviewing, completing any pre-employment testing, or otherwise participating in the employee selection process, please email candidate_accommodations@upstart.com

https://www.upstart.com/candidate_privacy_policy

About Upstart

If you’re energized by tackling meaningful problems, excited to innovate with purpose, and motivated by work that truly matters, we’d love to hear from you.

**The Team: **

How you’ll make an impact

Lead the definition, advocacy, and adoption of SRE principles across engineering teams
Partner with leadership to shape long-term reliability, resiliency, and observability strategies
Champion distributed tracing, real user monitoring (RUM), and key performance metrics such as Largest Contentful Paint (LCP) to improve system visibility and user experience
Build and scale self-healing systems to minimize manual intervention and reduce downtime
Drive enterprise-wide improvements to incident response processes, including those related to Machine Learning systems
Collaborate closely with Development Productivity and Quality teams to improve engineering velocity without sacrificing reliability
Influence technical and operational roadmaps through data-driven insights and hands-on technical contributions
Own and deliver cross-functional initiatives from concept through execution, applying program management skills to align stakeholders and achieve results

**Minimum Qualifications **

Bachelor’s degree in Computer Science, Engineering, or Mathematics, or a related field (or its equivalent) + 8 years of experience
Combined experience with both Software Engineering and Site Reliability Engineering, with a balanced background in both disciplines
Proven track record as an SRE thought leader and evangelist, driving adoption of reliability best practices across organizations
Strong communication and mentoring skills to influence engineers across disciplines
Proficiency in Python, Go, JavaScript/TypeScript
Proficiency with Infrastructure as Code (Terraform, CDK, CloudFormation, etc.)
Experience building internal tooling from scratch in agile development environments
Expertise with observability, distributed tracing, RUM, LCP, and performance monitoring tools (e.g., Datadog, Prometheus)
Experience with on-call and incident management, including large-scale or ML-related incidents
Strong background in automation and building self-healing systems
Hands-on experience with LLM/GenAI to improve SRE efficiency and processes
Program management skills, including the ability to propose innovative solutions, influence leadership, improve processes, and drive cross-functional projects to completion

Preferred Qualifications

Experience with service mesh
Full stack development skills
Experience building or extending observability platforms
Background in Development Productivity or Quality Platforms
Experience in high-scale SaaS, microservice-oriented cloud environments

Position location This role is available in the following locations: Remote-US, Remote-Canada

Time zone requirements The team operates on the East/West coast time zones.

#LI-REMOTE

#LI-MidSenior

In addition, Upstart provides employees with target bonuses, equity compensation, and generous benefits packages (including medical, dental, vision, and 401k).

United States | Remote - Anticipated Base Salary Range

$195,300—$270,400 USD

In addition, Upstart provides employees with target bonuses, equity compensation, and generous benefits packages (including medical, dental, vision, and 401k).

Canada | Remote - Anticipated Base Salary Range

$182,800—$230,000 CAD

What you'll love

At Upstart, our benefits are designed to support your health, financial well-being, family, and personal growth. Here’s what you can expect:

Competitive compensation, including base pay, bonus opportunities, and annual equity grants that vest quarterly
Retirement benefits to help you plan for the future, including a 401(k) or Group Retirement Savings Plan with a company match of $2 for every $1 contributed, up to $15,000 annually (USD in the US, CAD in Canada)
Employee Stock Purchase Plan (ESPP) with discounted stock purchase options for eligible employees (US only)
Comprehensive health coverage designed to support you and your family, including medical, dental, vision, and wellness resources for US and supplemental health coverage for Canada.
Health Savings Account contributions from Upstart for eligible plans (US only)
Income protection benefits, including life insurance and disability coverage for added financial security
Paid time off, sick leave, and company holidays, in line with local requirements
Paid family and parental leave to support caregiving and major life moments (duration varies by country)
Family-centered benefits to support fertility, parenthood, and caregiving needs
Employee Assistance Program (EAP) offering mental health support and life-centered resources
Financial wellness resources, including access to financial planning tools and a financial concierge service (US Only)
Annual wellness allowance to support your physical and emotional well-being and personal development, based on what matters most to you
Annual productivity allowance to invest in relevant tools and resources you need to do your best work, no matter where you work from
Connection and community through team events, all-company updates, and employee resource groups (ERGs)
Onsite perks, including catered lunches and fully stocked micro-kitchens when working from one of our offices in the Bay Area, Austin, Columbus, and New York City (opening Summer 2026!)

For roles based in Canada, please note that we are not currently able to hire in Quebec.

Upstart is a proud Equal Opportunity Employer. Just as we are dedicated to improving access to affordable credit for all, we are committed to inclusive and fair hiring practices.

https://www.upstart.com/candidate_privacy_policy

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized