Tech Lead, Research Scientist - Dpu & AI Infra

ByteDance ByteDance · Big Tech · Seattle, WA · Infrastructure

Tech Lead, Research Scientist focused on DPU and AI infrastructure, optimizing distributed training and inference by leveraging DPUs, GPUs, and custom hardware. The role involves designing and developing high-performance network software, collaborating on software-hardware co-design, and driving end-to-end performance optimization.

What you'd actually do

  1. Explore AI/ML infrastructure acceleration, leveraging DPUs, GPUs, and custom hardware to optimize distributed training and inference.
  2. Design and develop DPU network software with a focus on high performance, low latency, and reliability.
  3. Collaborate with hardware teams to build software-hardware co-design solutions for networking and storage acceleration.
  4. Drive end-to-end performance optimization, from OS kernels and drivers to user-space runtime systems.
  5. Contribute to architecture design, technical proposals, and long-term research directions.

Skills

Required

  • C/C++ development and debugging
  • Linux systems development
  • compute, network architecture, and operating systems understanding
  • software-hardware co-design OR distributed systems OR high-performance networking OR AI/ML systems

Nice to have

  • Ph.D. in related fields with research training and publications
  • network virtualization (OVS, SR-IOV, eBPF)
  • DPDK and high-performance user-space networking
  • hardware acceleration experience, FPGA/ASIC/GPU/CUDA
  • NCCL Collectives
  • AI communication patterns and parallelization techniques
  • inference kv cache system
  • data preprocessing system

What the JD emphasized

  • AI/ML infrastructure acceleration
  • distributed training and inference
  • software-hardware co-design
  • high performance
  • low latency

Other signals

  • AI/ML infrastructure acceleration
  • distributed training and inference
  • GPU virtualization and scheduling