Research Intern - Ai/ml Numerics & Efficiency

Microsoft Microsoft · Big Tech · Redmond, WA +1 · Applied Sciences

Research Intern role focusing on ML systems, numeric precision, data types, and compute technologies for AI workloads at Azure scale. The role involves investigating model efficiency through low-precision formats, quantization, ML kernel development, and benchmarking. It aims to inform decisions on compute platforms, acceleration strategies, and system-level optimizations for training and inference of large-scale models.

What you'd actually do

  1. contribute to research and exploration in advanced machine learning (ML) systems, focusing on the numeric, data types, and compute technologies that drive the next generation of Artificial Intelligence (AI) workloads at Azure scale.
  2. collaborate across Azure teams to investigate cutting-edge approaches in model efficiency ranging from low-precision formats, quantization strategies, and ML kernel development, to benchmarking and analyzing emerging model architecture and hardware capabilities.
  3. play a critical role in evaluating, prototyping, and analyzing new algorithmic and numerical techniques that improve the performance, cost, and efficiency of training and inference for large-scale models.
  4. develop expertise in ML systems, emerging data types, kernel optimization, and performance modeling while gaining hands-on experience with the latest Azure AI and hardware technologies.

Skills

Required

  • Python
  • C++
  • machine learning systems

Nice to have

  • transformer-based model architectures
  • attention mechanisms
  • KV cache behavior
  • PyTorch
  • Hugging Face Transformers
  • SGLang
  • vLLM
  • TensorRT-LLM
  • GPU programming
  • CUDA
  • Triton
  • profiling
  • performance analysis
  • low-precision numeric
  • quantization methods
  • hardware–software co-design
  • ML systems
  • model optimization
  • kernel development
  • numerical computing
  • analytical skills
  • problem-solving skills
  • ML systems
  • computational performance

What the JD emphasized

  • model efficiency
  • training and inference
  • large-scale models
  • ML systems
  • kernel development
  • performance modeling
  • low-precision formats
  • quantization strategies

Other signals

  • research
  • ML systems
  • model efficiency
  • Azure scale
  • training and inference