Cambridge Residency Programme: Next-generation AI Datacentre Networking

Microsoft Microsoft · Big Tech · Cambridge, MA, United Kingdom +1 · Research Sciences

Microsoft Research Cambridge is seeking two researchers for a two-year postdoctoral program to advance the design and evaluation of next-generation datacentre networks specifically for AI training and inference workloads. The program involves two tracks: one focused on analytical modeling and simulation, and the other on systems implementation and experimental validation using advanced hardware testbeds. The goal is to publish research findings and influence future AI infrastructure strategy.

What you'd actually do

  1. Design and analyse novel network architectures (e.g., hybrid optical-electrical, reconfigurable topologies) tailored for AI communication patterns.
  2. Develop analytical models and simulators to quantify the performance, cost, and energy trade-offs of proposed designs.
  3. Implement and evaluate network protocols, transport mechanisms, and collective communication schemes on experimental hardware testbeds featuring modern GPUs, optical circuit switches, and RDMA interconnects.
  4. Build and run communication-intensive workloads (e.g., collective algorithm benchmarks, distributed training/inference jobs) to stress-test new network designs.
  5. publish findings at top-tier academic venues and contribute to Microsoft’s long-term AI infrastructure strategy.

Skills

Required

  • Analytical modelling
  • Simulation
  • Performance modelling
  • Systems programming in C++/CUDA/Python
  • Building or evaluating networked systems
  • Distributed systems
  • AI training/inference infrastructure

Nice to have

  • Datacentre network architectures
  • Transport protocols
  • Collective communication
  • Circuit-switched or optical networking concepts
  • AI/ML workload communication patterns
  • Building simulators
  • Evaluation frameworks
  • Experimental prototypes
  • Python
  • Scientific computing libraries (NumPy, SciPy, pandas)
  • High-performance networking (RDMA, RoCEv2, InfiniBand)
  • Transport protocol implementation
  • Congestion control
  • CUDA programming
  • NCCL
  • ML training/inference systems (PyTorch, Megatron, vLLM)
  • Building or managing hardware testbeds
  • Measurement and profiling

What the JD emphasized

  • PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Mathematics, Operations Research, or a related field.
  • Evidence of independent research, such as first-author publications, strong thesis work, or impactful prototypes.

Other signals

  • AI training and inference are fundamentally changing the communication patterns and cost envelope of cloud infrastructure
  • advance the design and evaluation of next-generation datacentre networks for AI workloads
  • publish findings at top-tier academic venues and contribute to Microsoft’s long-term AI infrastructure strategy