Principal Software Engineering Manager - Substrate Efficiency

Microsoft Microsoft · Big Tech · Redmond, WA +1 · Software Engineering

This role leads a team focused on optimizing the inference efficiency of the M365 Copilot platform, which operates at massive GPU scale. The goal is to maximize throughput per GPU, reduce cost per query, and improve runtime performance for large-scale AI experiences.

What you'd actually do

  1. Build and lead a high-performing engineering team focused on inference runtime efficiency and model execution performance.
  2. Define and drive strategy to improve throughput per GPU through runtime optimizations.
  3. Increase engineering agility, enabling faster experimentation, iteration, and rollout of performance improvements.
  4. Partner across M365 Core, AI Core, Azure, and Microsoft Research to co-design and productionize advanced inference optimizations.
  5. Establish metrics, telemetry, and experimentation frameworks to measure efficiency gains and guide investment decisions.

Skills

Required

  • Bachelor's Degree in Computer Science or related technical field
  • 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python

Nice to have

  • Master's Degree in Computer Science or related technical field AND 8+ years technical engineering experience
  • Bachelor's Degree in Computer Science or related technical field AND 12+ years technical engineering experience
  • 4+ years people management experience
  • Experience leading engineering teams building backend or distributed systems
  • Hands-on experience improving system throughput, performance, and resource utilization across large-scale infrastructure
  • Systems thinking, with the ability to identify and optimize bottlenecks across execution, scaling, and resource management
  • Experience driving system-level improvements in areas such as workload execution, scheduling, batching, or infrastructure efficiency
  • Experience with developing AI/ML inference systems or GPU-based workloads
  • Familiarity with inference or training runtime optimization techniques
  • Experience improving throughput per resource (e.g., cost per query) in large-scale systems
  • Able to translate technical insights into clear engineering priorities and execution plans
  • Comfortable collaborating across teams to align on goals and execution

What the JD emphasized

  • maximizing throughput per GPU
  • inference engine efficiency
  • optimizing model execution and runtime performance
  • improving throughput per GPU
  • reducing cost per query
  • unlocking capacity without additional hardware investment
  • live-site performance, reliability, and operational excellence for inference engines at scale

Other signals

  • LLM inference platform
  • GPU scale
  • low-latency
  • high-availability
  • performance
  • scalability
  • efficiency
  • throughput per GPU
  • cost per query