Research Scientist - Dpu & AI Infra

ByteDance ByteDance · Big Tech · San Jose, CA · Infrastructure

Research Scientist focused on DPU and AI infrastructure, aiming to accelerate distributed training and inference by co-designing software and hardware solutions. Explores AI/ML infrastructure acceleration leveraging DPUs, GPUs, and custom hardware.

What you'd actually do

  1. Design and develop DPU network software with a focus on high performance, low latency, and reliability.
  2. Collaborate with hardware teams to build software-hardware co-design solutions for networking and storage acceleration.
  3. Explore AI/ML infrastructure acceleration, leveraging DPUs, GPUs, and custom hardware to optimize distributed training and inference.
  4. Drive end-to-end performance optimization, from OS kernels and drivers to user-space runtime systems.
  5. Contribute to architecture design, technical proposals, and long-term research directions.

Skills

Required

  • C/C++ development and debugging
  • Linux systems development
  • compute, network architecture, and operating systems
  • software-hardware co-design OR distributed systems OR high-performance networking OR AI/ML systems

Nice to have

  • Ph.D. in related fields with research training and publications
  • network virtualization (OVS, SR-IOV, eBPF)
  • DPDK and high-performance user-space networking
  • hardware acceleration experience, FPGA/ASIC/GPU/CUDA
  • NCCL Collectives
  • AI communication patterns and parallelization techniques
  • inference kv cache system
  • data preprocessing system

What the JD emphasized

  • strong research/publications
  • AI/ML systems
  • software-hardware co-design
  • AI communication patterns

Other signals

  • AI/ML infrastructure acceleration
  • distributed training and inference
  • DPU, GPU, and custom hardware