Principal Site Reliability Engineer - Ctj - Secret

Microsoft Microsoft · Big Tech · Redmond, WA +1 · Site Reliability Engineering

Principal Site Reliability Engineer for Microsoft Substrate, a foundational cloud platform powering critical services like Exchange Online and M365 Copilot. The role focuses on setting technical and operational direction for reliability, influencing architecture, strategy, and engineering practices across teams, especially in regulated environments. Responsibilities include defining reliability strategy, leading incident response, architecting automation and observability solutions, and driving architectural decisions for reliability, security, and compliance.

What you'd actually do

  1. Define and drive reliability strategy, SLO frameworks, and operational best practices across Substrate workloads in highly regulated environments
  2. Serve as an actively engaged senior on-call engineer (OCE), participating in on-call rotations and leading incident response for Substrate services in regulated environments.
  3. Provide hands-on leadership during the most complex or high-impact incidents, setting technical direction and response strategy.
  4. Drive high-quality post-incident reviews that result in durable, systemic engineering improvements across teams.
  5. Architect and deliver large-scale automation, observability, and self-healing solutions.

Skills

Required

  • Doctorate Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience.

Nice to have

  • Doctorate Degree in Computer Science, Information Technology, or related field AND 5+ years technical experience in software engineering, network engineering, or systems administration OR Master's Degree in Computer Science, Information Technology, or related field AND 8+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 12+ years technical experience in software engineering, network engineering, or systems administration OR equivalent experience.
  • 7+ years technical experience working with large-scale cloud or distributed systems.
  • 3+ years people management experience
  • Experience operating or supporting services in regulated, sovereign, or compliance-sensitive environments.

What the JD emphasized

  • highly regulated environments
  • regulated environments
  • regulated, sovereign, or compliance-sensitive environments