Site Reliability Engineer - Ctj - Secret

Microsoft Microsoft · Big Tech · Redmond, WA +1 · Site Reliability Engineering

This role is for a Site Reliability Engineer (SRE) who will own the reliability and operational outcomes for specific components or services within the Microsoft Substrate platform. The platform powers critical services like Exchange Online and M365 Copilot, operating at a global scale and in highly-regulated environments. Responsibilities include diagnosing and resolving production issues, designing and implementing automation, and collaborating with engineering teams to ensure reliability and operability. The role requires strong software engineering fundamentals and experience with large-scale cloud or distributed systems.

What you'd actually do

  1. Own reliability and operational health for one or more Substrate components or services in highly regulated environments.
  2. Serve as an actively engaged on-call engineer (OCE), participating in an on-call rotation and independently responding to incidents for owned services.
  3. Respond to, diagnose, and resolve production incidents with minimal supervision.
  4. Design and implement automation to reduce operational toil and improve service stability.
  5. Develop and maintain monitoring, alerting, and telemetry to support SLOs and operational metrics.

Skills

Required

  • software engineering
  • network engineering
  • systems administration
  • technical experience

Nice to have

  • large-scale cloud or distributed systems

What the JD emphasized

  • highly regulated environments
  • highly-regulated environments
  • Security Clearance Requirements
  • Microsoft Cloud Background Check