Prinicpal Architect, Site Reliability Engineering

Peloton Peloton · Consumer · United States · Remote · Software

This role is for a Principal Architect of Site Reliability Engineering (SRE) for Internal Systems at Peloton. The focus is on ensuring the resilience, observability, and scalability of the company's global SaaS ecosystem, specifically for Order-to-Cash, Procure-to-Pay, and Record-to-Report lifecycles. The role involves leading a team of SREs, architecting observability, defining SLOs, owning incident response, and championing Infrastructure as Code. While the company uses AI tools, this role is primarily focused on the reliability and infrastructure of enterprise systems, not the direct development or research of AI/ML models.

What you'd actually do

  1. Lead, mentor, and grow a team of SREs. Conduct 1:1s, define career growth paths, and foster a culture of high accountability and psychological safety
  2. Transition from reactive support to proactive engineering. Align the team’s quarterly goals with broader Finance and Supply Chain digital transformation initiatives
  3. Architect observability across complex business paths (e.g., ensuring a customer order flows from e-commerce through supply chain into the financial ledger)
  4. Partner with business owners to define and track Service Level Objectives (SLOs) and Error Budgets for critical SaaS integrations
  5. Own the Major Incident Response process for corporate systems. Ensure "War Rooms" are efficient and result in actionable improvements

Skills

Required

  • SRE
  • DevOps
  • Production Engineering
  • People Management
  • Order-to-Cash
  • Procure-to-Pay
  • Enterprise SaaS management (NetSuite, SAP, Workday, Salesforce)
  • Networking (SD-WAN, VPNs)
  • Identity Management (IAM)
  • Endpoint Management
  • Datadog
  • Splunk
  • New Relic
  • Prometheus
  • Python
  • Go
  • Terraform

Nice to have

  • AI tools

What the JD emphasized

  • 8+ years in SRE, DevOps, or Production Engineering, with 2+ years of direct people management experience
  • Deep understanding of Order-to-Cash or Procure-to-Pay cycles
  • Management of enterprise ecosystems (NetSuite, SAP, Workday, Salesforce)
  • Proven ability to communicate technical risk to non-technical stakeholders (CFO, General Counsel, Head of People)