Principal Software Engineering Manager

Microsoft Microsoft · Big Tech · Redmond, WA +1 · Software Engineering

Principal Software Engineering Manager to lead a team focused on control plane automations for capacity buildout of M365 Copilot inference services. This role involves technical leadership in capacity planning, custom model deployment automation, and replacing manual workflows with automated systems to ensure low-latency, highly available AI experiences at massive GPU scale.

What you'd actually do

  1. Lead and grow a team of software engineers building control plane services and automations across the capacity buildout area.
  2. Drive technical design and execution for capacity automation — intake, planning, deployment, fleet health, and control plane components — prioritizing the highest-impact work for Copilot capacity.
  3. Replace manual, ticket-driven capacity workflows with automated, data-driven systems; reduce time from capacity request to production traffic for priority workloads.
  4. Own live-site, reliability, and operational excellence for the services your team builds; establish SLAs, metrics, and on-call practices.
  5. Partner with peer engineering managers on adjacent capacity areas, and with partner teams across M365 Core, AI Core, Azure, and Microsoft Research to align on dependencies and unblock execution.

Skills

Required

  • Bachelor's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • Ability to meet Microsoft, customer and/or government security screening requirements

Nice to have

  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • 4+ years people management experience.
  • Experience as an engineering manager leading IC (individual contributor) teams building distributed systems, platform services, or cloud infrastructure at scale.
  • Technical depth — able to participate in design reviews, debug live-site issues, and raise the engineering bar through code and design feedback.
  • Track record shipping production services with live-site and on-call ownership.
  • Experience building automation and tooling that replaces manual operational work.
  • Ability to work across team and org boundaries to align on dependencies, surface trade-offs, and drive execution.
  • Hiring, coaching, and people-development track record.
  • Ability to take an ambiguous charter and turn it into a focused roadmap with clear priorities.
  • Experience with AI/ML infrastructure, GPU fleets, or large-scale inference or training systems.
  • Experience with capacity planning, fleet management, or supply/demand optimization at scale.
  • Familiarity with Azure, M365, or AI workload cost models (COGS, utilization, throughput).
  • Background building control planes, orchestration platforms, or automation systems from 0→1.
  • Experience hiring and growing IC teams in a high-growth platform org.

What the JD emphasized

  • capacity buildout
  • control plane
  • automation

Other signals

  • massive GPU scale
  • low-latency, highly available Copilot experiences
  • control plane automations for capacity buildout
  • replace manual, ticket-driven capacity workflows with automated, data-driven systems