Principal Software Engineering Manager - AI Frameworks

Microsoft Microsoft · Big Tech · Redmond, WA +2 · Software Engineering

This role manages a team focused on optimizing the AI software serving stack, including runtimes, libraries, and APIs, for large-scale model training and inference. The team benchmarks and optimizes LLMs across various hardware, aiming to improve performance, reduce hardware footprint, and enhance Azure's capex efficiency.

What you'd actually do

  1. Lead and develop a team of engineers working across multiple layers of the AI software stack to enable large-scale training and inference.
  2. Set technical vision and execution strategy for model performance benchmarking, optimization, and deployment across GPUs and Microsoft hardware.
  3. Drive performance outcomes by prioritizing and overseeing efforts to benchmark, profile, debug, and optimize training and inference workloads.
  4. Own performance health by establishing mechanisms to monitor regressions, measure impact, and continuously improve time-to-deploy and hardware efficiency.
  5. Partner cross-functionally with research, product, infrastructure, and hardware teams to deliver scalable, production-ready AI performance improvements.

Skills

Required

  • Computer Science or related technical field Bachelor's Degree
  • 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
  • Software engineering principles
  • computer architecture
  • GPU architecture
  • hardware acceleration for neural networks

Nice to have

  • Master's Degree in Computer Science or related technical field AND 10+ years of software engineering experience, including 6+ years in engineering management
  • Bachelor's Degree in Computer Science or related technical field AND 12+ years of software engineering experience, including 6+ years in engineering management
  • leading teams responsible for end-to-end performance analysis and optimization of LLMs, AI systems, or HPC workloads
  • GPU profiling and performance analysis tools
  • lead cross-team initiatives
  • align stakeholders
  • translate research or platform capabilities into scalable, production-ready solutions
  • people leadership skills, including hiring, coaching, performance management, and career development
  • building high-performing, inclusive teams
  • AI / ML infrastructure
  • DNN or LLM training and/or inference systems
  • PyTorch
  • TensorFlow
  • ONNX Runtime
  • GPU software stacks
  • CUDA
  • ROCm
  • Triton

What the JD emphasized

  • performance optimization
  • large-scale model training and inference
  • hardware efficiency
  • GPU architecture
  • performance analysis and optimization

Other signals

  • large-scale model training and inference
  • performance optimization
  • hardware acceleration
  • GPU architecture