Senior Software Engineer - Performance Tooling

Microsoft Microsoft · Big Tech · Redmond, WA +2 · Software Engineering

Senior Software Engineer focused on performance tooling for AI frameworks, enabling large-scale training and inference of LLMs on various hardware. The role involves benchmarking, debugging, profiling, and optimizing performance for models like OpenAI's LLMs, aiming to reduce deployment time and hardware footprint.

What you'd actually do

  1. Work across multiple layers of the AI software stack (abstractions, programming models, compilers, runtimes, libraries, and APIs) to enable large-scale model training and inference.
  2. Benchmark OpenAI and other LLMs for performance on GPUs and Microsoft hardware.
  3. Debug, profile, and optimize performance for training/inference workloads on Central Processing Units (CPUs)/Graphics Processing Units (GPUs).
  4. Monitor performance regressions and drive continuous improvements to reduce time-to-deploy and hardware footprint.
  5. Collaborate across teams of researchers and engineers to deliver scalable, production-ready AI performance improvements

Skills

Required

  • Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C++, or Python OR equivalent experience.
  • Ability to meet Microsoft, customer and/or government security screening requirements

Nice to have

  • Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C++, or Python
  • Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C++, or Python
  • equivalent experience.
  • 4+ years’ practical experience working on high performance applications and performance debugging and optimization on CPUs/GPUs.
  • Experience in DNN/LLM inference and experience in one or more DL frameworks such as PyTorch, Tensorflow, or ONNX Runtime and familiarity with CUDA, ROCm, Triton.
  • Technical background and solid foundation in software engineering principles, computer architecture, GPU architecture, hardware neural net acceleration.
  • Experience in end-to-end performance analysis and optimization of state of the art LLMs and HPC applications, including proficiency using GPU profiling tools.
  • Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers.
  • Ability to independently lead projects

What the JD emphasized

  • performance debugging and optimization on CPUs/GPUs
  • DNN/LLM inference
  • performance analysis and optimization of state of the art LLMs

Other signals

  • LLM inference performance
  • GPU and Microsoft hardware optimization
  • large scale training and inferencing