Principal Software Engineer - Performance Tooling

Microsoft Microsoft · Big Tech · Redmond, WA +2 · Software Engineering

The Principal Software Engineer - Performance Tooling role focuses on optimizing the performance of AI models, particularly LLMs, across various hardware platforms (GPUs, CPUs) and software layers. This involves benchmarking, debugging, profiling, and optimizing for large-scale training and inference to reduce deployment time and hardware footprint, contributing to the efficiency of AI services like Azure OpenAI.

What you'd actually do

  1. Work across multiple layers of the AI software stack (abstractions, programming models, compilers, runtimes, libraries, and APIs) to enable large-scale model training and inference.
  2. Benchmark OpenAI and other LLMs for performance on Graphic Processing Units (GPUs) and Microsoft hardware.
  3. Debug, profile, and optimize performance for training/inference workloads on CPUs (Central Processing Units)/GPUs.
  4. Monitor performance regressions and drive continuous improvements to reduce time-to-deploy and hardware footprint.
  5. Collaborate across teams of researchers and engineers to deliver scalable, production-ready AI performance improvements.

Skills

Required

  • C++
  • Python
  • Computer Science fundamentals
  • Software engineering principles

Nice to have

  • Master's Degree
  • PyTorch
  • Tensorflow
  • ONNX Runtime
  • CUDA
  • ROCm
  • Triton
  • GPU architecture
  • hardware neural net acceleration
  • GPU profiling tools
  • Cross-team collaboration
  • Project leadership

What the JD emphasized

  • performance
  • optimize performance
  • performance debugging and optimization
  • performance analysis and optimization

Other signals

  • inference performance
  • LLM models
  • large scale training and inferencing
  • performance optimization