Research Intern - AI Network Observability

Microsoft Microsoft · Big Tech · Mountain View, CA +2 · Applied Sciences

Research intern to design and develop tools for AI datacenter network observability, focusing on high-performance tracing and analysis systems for packet-level behavior at high speeds. The role involves prototyping solutions on real hardware and collaborating with engineers to improve reliability and explainability of AI intra-datacenter networking.

What you'd actually do

  1. contribute to the research, design, and development of tools to provide insights into multi-path network transports for large-scale Artificial Intelligence (AI) datacenter environments.
  2. building high-performance tracing and analysis systems capable of capturing packet-level behavior at extremely high speeds (up to 800Gbps).
  3. enhance observability for next-generation transport protocols supporting AI workloads.
  4. prototype solutions on real hardware and collaborate with engineers to improve reliability and strengthen the explainability of AI intra-datacenter networking.

Skills

Required

  • PhD program in Computer Science or a related STEM field

Nice to have

  • datacenter networking and systems research
  • high performance programming network data paths (e.g., using C++)
  • RDMA and/or DPDK
  • RoCE, knowledge of TCP, UDP, IP, ethernet