Research Intern - AI Frameworks (network Systems and Tools)

Microsoft Microsoft · Big Tech · Redmond, WA +1 · Applied Sciences

Research intern focused on next-generation AI systems, specifically exploring disaggregated inference, memory-architecture, and interconnect technologies for LLM serving, with a focus on request scheduling and KV caching optimizations. The role involves investigating and evaluating disaggregated KV cache architectures and building a P2P service KV cache sharing architecture.

What you'd actually do

  1. Investigate and evaluate emerging disaggregated KV cache architectures.
  2. Implement a hierarchical storage architecture with multiple tiers
  3. Build Peer-to-Peer (P2P) service KV cache sharing architecture that enables direct, high-performance cache transfer between multiple LLM serving instances without requiring centralized cache servers.

Skills

Required

  • Currently enrolled in a PhD program in Computer Science, Electrical/Computer Engineering, or a related field

Nice to have

  • Research experience in areas such as computer architecture, AI/ML systems, performance modeling, distributed systems, or hardware–software co-design
  • Programming skills in Python, C/C++ with experience building prototypes, simulators, or performance analysis tools
  • Familiarity with modern AI workloads and/or deep learning frameworks (e.g., PyTorch)
  • Demonstrated ability to define and pursue original research directions in AI systems or architecture
  • Ability to collaborate effectively with researchers across disciplines and work in cross-group, cross-cultural environments
  • Proficient communication and presentation skills for sharing complex technical insights
  • Ability to think creatively and approach system and architecture challenges with unconventional or innovative solutions
  • Experience with PyTorch, CUDA, Triton, or performance-simulation tools
  • Background in large-scale system design, AI inference bottleneck analysis, or modeling cost/performance tradeoffs
  • Understanding of accelerator, memory-system, or interconnect design principles

What the JD emphasized

  • PhD program in Computer Science, Electrical/Computer Engineering, or a related field
  • Research experience in areas such as computer architecture, AI/ML systems, performance modeling, distributed systems, or hardware–software co-design
  • Programming skills in Python, C/C++ with experience building prototypes, simulators, or performance analysis tools
  • Familiarity with modern AI workloads and/or deep learning frameworks (e.g., PyTorch)
  • Demonstrated ability to define and pursue original research directions in AI systems or architecture
  • Experience with PyTorch, CUDA, Triton, or performance-simulation tools
  • Understanding of accelerator, memory-system, or interconnect design principles

Other signals

  • AI systems and architecture
  • disaggregated inference
  • memory-architecture
  • interconnect technologies
  • request scheduling
  • KV caching optimizations
  • LLM serving instances