Research Intern for Supernode Solution

Intel Intel · Semiconductors · Shanghai, China

Research Intern focusing on system innovation, cost optimization, and GPU interconnect protocols for disaggregated AI supernode architectures. The role involves exploring architectural innovations, implementing distributed memory pooling, and researching Ethernet-native GPU interconnect protocols for large-scale AI inference and training clusters. Familiarity with RDMA, Mellanox tools, and LLM inference benchmarking methodologies is required.

What you'd actually do

  1. Explore architectural innovations for disaggregated AI supernode designs, with a focus on system-level performance trade-offs, BOM cost reduction strategies, and scalability from NPI to HVM.
  2. Research and prototype Ethernet-native GPU interconnect protocols and distributed memory pooling mechanisms for large-scale AI inference and training clusters.

Skills

Required

  • Master's student in Electrical Engineering, Computer Engineering, Computer Science, or a related field
  • Familiarity with core RDMA operations, including one-sided read/write, Send/Receive verbs, QP management, and memory registration
  • Hands-on experience with Mellanox ConnectX-7 testing and diagnostic tools, including perftest, ibstat, mlnx_tuning, and OFED utilities
  • Working knowledge of LLM inference benchmarking methodology and standard metrics (TTFT, TBT, token/s)
  • experience using frameworks such as vLLM, lm-evaluation-harness, or equivalent

Other signals

  • AI supernode designs
  • large-scale AI inference and training clusters
  • LLM inference benchmarking