Research Intern - LLM Performance Optimization

Microsoft Microsoft · Big Tech · Redmond, WA +1 · Applied Sciences

Research Intern role focused on optimizing the performance of Large Language Models (LLMs), involving architecture and inference performance. Requires PhD student status in a STEM field and experience with LLM architecture or inference performance optimization. Preferred qualifications include experience with GPU kernel performance bottlenecks and optimizing compiler architecture.

What you'd actually do

  1. Research Interns put inquiry and theory into practice.
  2. Alongside fellow doctoral candidates and some of the world’s best researchers, Research Interns learn, collaborate, and network for life.
  3. Research Interns not only advance their own careers, but they also contribute to exciting research and development strides.
  4. During the 12-week internship, Research Interns are paired with mentors and expected to collaborate with other Research Interns and researchers, present findings, and contribute to the vibrant life of the community.
  5. Research internships are available in all areas of research, and are offered year-round, though they typically begin in the summer.

Skills

Required

  • PhD program in Computer Science or related STEM field
  • Large Language Model architecture or inference performance optimization

Nice to have

  • assess and fix kernel performance bottlenecks for GPUs or other high performance parallel computer architectures
  • optimizing compiler architecture and intermediate representations (such as LLVMIR or MLIR)
  • Ability to think unconventionally to derive creative and innovative solutions

What the JD emphasized

  • Large Language Model architecture or inference performance optimization

Other signals

  • LLM performance optimization
  • GPU performance optimization
  • compiler architecture