Senior Researcher - GPU Performance

Microsoft Microsoft · Big Tech · Redmond, WA +1 · Research Sciences

Applied Research role focused on hardware/software codesign for GPU kernel optimizations to improve efficiency of Large Language Models and Generative AI inference. Involves designing, implementing, and optimizing GPU kernels, researching novel optimization techniques, and profiling performance.

What you'd actually do

  1. Design, implement, and optimize GPU kernels for complex computational workloads such as AI inferencing.
  2. Research and develop novel optimization techniques for generation of GPU kernels.
  3. Profile and analyze kernel performance using advanced diagnostic tools.
  4. Generate automated solutions for kernel optimization and tuning.
  5. Collaborate with other researchers to improve model performance.

Skills

Required

  • Doctorate in relevant field or equivalent experience
  • 2+ years of experience in GPU architecture, memory hierarchies, parallel computing and algorithm optimization
  • 2+ years of experience in GPU programming, including performance profiling and optimization tools
  • Reliable C++ programming skills

Nice to have

  • 5+ years of experience in GPU programming and optimization
  • expert knowledge of CUDA, ROCm, Triton, PTX, CUTLASS, or similar GPU programming frameworks
  • Experience with machine learning frameworks (PyTorch, TensorFlow)
  • Familiarity with compiler optimization techniques and background in auto-tuning and automated code generation
  • Publication record in relevant conferences or journals

What the JD emphasized

  • GPU architecture
  • kernel-level optimizations
  • Large Language Models
  • Generative AI experiences
  • end-to-end AI stack

Other signals

  • GPU performance optimization
  • Large Language Models
  • Generative AI efficiency