Principal Software Engineer Manager

Microsoft Microsoft · Big Tech · Redmond, WA +1 · Software Engineering

This role is for a Principal Engineering Manager at Microsoft, leading the architecture, delivery, and live-site excellence of RDX deployment services. The team focuses on ensuring the safe, compliant, and data-driven deployment of Microsoft 365 client updates across global enterprise environments. This involves building and operating large-scale distributed services for release orchestration, rollout governance, and automated recovery, with an emphasis on telemetry, staged rollouts, and automated safeguards to minimize customer impact and regressions. The role requires strong experience in distributed systems, production deployment safety, and influencing engineering outcomes across multiple product groups.

What you'd actually do

  1. Own CPS and MASP service architecture supporting Office client release and deployment workflows.
  2. Drive reliability, scalability and availability of deployment services used across M365 app teams.
  3. Enable Safe-to-Change release infrastructure through staged rollouts and automated safeguards.
  4. Deliver automation-first rollback and remediation capabilities to minimize customer impact.
  5. Define telemetry pipelines and data signals used for release gating, validation and rollback.

Skills

Required

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python.
  • Experience with large-scale distributed systems and service architecture.
  • Experience in deployment safety

Nice to have

  • Bachelor’s or Master’s degree in Computer Science, Engineering or related field (or equivalent experience).
  • 12+ years of experience building and operating production software or services.
  • 5+ years of engineering management experience.

What the JD emphasized

  • large-scale distributed services
  • production deployment safety
  • release orchestration
  • rollout governance
  • automated recovery
  • staged rollouts
  • automated safeguards
  • telemetry pipelines
  • usage data
  • reliability data
  • performance data
  • deployment observability
  • predictive deployment health signals
  • deployment quality SLAs
  • recovery SLAs