Research Intern - AI System Architecture Modeling and Performance

Microsoft Microsoft · Big Tech · Hillsboro, OR +1 · Applied Sciences

Research Intern role focused on AI system architecture modeling and performance within Azure's hyperscale infrastructure. The intern will evaluate hardware/software co-design opportunities, optimize CPU, GPU, and networking infrastructure for AI accelerators, and develop methodologies for performance analysis and architectural idea evaluation.

What you'd actually do

  1. identify system stress points, propose novel architectural ideas, and create methodologies using a combination of workload characterization, modeling and benchmarking to evaluate their effectiveness.
  2. evaluate opportunities to co-optimize central processing unit (CPU), graphics processing unit (GPU) and networking infrastructure for the Maia accelerator ecosystem.
  3. be at the forefront of hardware/software co-design and have a direct impact in answering critical questions around designing an optimized AI system and evaluating real-world impact on the Azure’s supporting hyperscale infrastructure.

Skills

Required

  • PhD program in Computer Science or related STEM field
  • performance analysis tools and methodologies
  • optimization
  • modeling

Nice to have

  • PyTorch
  • SGLang
  • Dynamo
  • CUDA
  • Triton
  • GPU architectures
  • AI accelerator architectures
  • memory hierarchies
  • compute-communication interplay
  • kernel scheduling
  • interconnect properties
  • CPU/server architectures
  • PCIe topologies
  • accelerator/NIC/peripheral demand
  • CPU involvement in dispatching, scheduling and orchestration of input data pipelines to AI accelerators
  • benchmarking
  • profiling
  • perf bottlenecks
  • performance analysis
  • performance optimization
  • trace generation
  • event monitoring
  • instrumentation
  • roofline performance modeling
  • detailed performance simulations
  • speed vs accuracy tradeoffs
  • performance analysis methodology
  • complex system architecture what-if scenarios
  • verbal and written communication skills

What the JD emphasized

  • PhD program in Computer Science or related STEM field
  • At least 1 year of experience with performance analysis tools and methodologies, optimization and modeling

Other signals

  • AI infrastructure architecture
  • optimizing workload optimized data flows for large-scale AI models
  • hardware/software co-design
  • evaluating real-world impact on hyperscale infrastructure
  • co-optimize CPU, GPU and networking infrastructure for the Maia accelerator ecosystem
  • identify system stress points
  • propose novel architectural ideas
  • create methodologies using workload characterization, modeling and benchmarking