Senior Software Engineer - Performance

Microsoft Microsoft · Big Tech · Mountain View, CA +1 · Software Engineering

Senior Software Engineer focused on optimizing the inference performance of large language models (LLMs) like those from OpenAI, running on various hardware including GPUs and custom Microsoft silicon. The role involves benchmarking, debugging, and optimizing performance to enable efficient deployment at scale for major Microsoft products and Azure services.

What you'd actually do

  1. Identify and drive improvements to end-to-end inference performance of OpenAI and other state-of-the-art LLMs
  2. Measure, benchmark performance on Nvidia/AMD GPUs and first party Microsoft silicon
  3. Optimize and monitor performance of LLMs and build SW tooling to enable insights into performance opportunities ranging from the model level to the systems and silicon level to improve customer experience and reduce the footprint of the computing fleet
  4. Enable fast time to market of LLMs/models and their deployments at scale by building SW tools that afford velocity in porting models on new Nvidia and AMD GPUs
  5. Design, implement, and test functions or components for our AI/DNN/LLM frameworks and tools

Skills

Required

  • C
  • C++
  • C#
  • Java
  • JavaScript
  • Python
  • software design and development skills
  • solving technical problems
  • building a full end-to-end AI stack

Nice to have

  • Computer architecture
  • GPU architecture
  • HW neural net acceleration
  • end-to-end performance analysis and optimization of state of the art LLMs
  • GPU profiling tools
  • DNN/LLM inference
  • PyTorch
  • Tensorflow
  • ONNX Runtime
  • CUDA
  • ROCm
  • Triton
  • Cross-team collaboration

What the JD emphasized

  • end-to-end inference performance
  • LLM frameworks and tools
  • performance opportunities

Other signals

  • inference performance
  • LLM serving
  • GPU optimization
  • Azure OpenAI