Cloud Acceleration Engineer – Dpu & AI Infra

ByteDance ByteDance · Big Tech · San Jose, CA · Infrastructure

ByteDance is seeking a Cloud Acceleration Engineer to focus on DPU and AI infrastructure. The role involves designing and developing high-performance DPU network software, collaborating on software-hardware co-design, and exploring AI/ML infrastructure acceleration for distributed training and inference. The position requires strong C/C++ and Linux systems development skills, with a background in areas like software-hardware co-design, distributed systems, networking, or AI/ML systems.

What you'd actually do

  1. Design and develop DPU network software with a focus on high performance, low latency, and reliability.
  2. Collaborate with hardware teams to build software-hardware co-design solutions for networking and storage acceleration.
  3. Explore AI/ML infrastructure acceleration, leveraging DPUs, GPUs, and custom hardware to optimize distributed training and inference.
  4. Drive end-to-end performance optimization, from OS kernels and drivers to user-space runtime systems.
  5. Contribute to architecture design, technical proposals, and long-term research directions.

Skills

Required

  • C/C++ development
  • Linux systems development
  • Compute architecture understanding
  • Network architecture understanding
  • Operating systems understanding
  • Software-hardware co-design OR distributed systems OR high-performance networking OR AI/ML systems

Nice to have

  • Ph.D. in related fields
  • Network virtualization (OVS, SR-IOV, eBPF)
  • DPDK
  • High-performance user-space networking
  • Hardware acceleration experience (FPGA/ASIC/GPU/CUDA)
  • NCCL Collectives
  • AI communication patterns
  • Parallelization techniques
  • Inference kv cache system design
  • Data preprocessing system design

What the JD emphasized

  • AI/ML infrastructure acceleration
  • distributed training and inference
  • software-hardware co-design
  • high performance
  • low latency

Other signals

  • DPU acceleration for AI/ML workloads
  • Optimize distributed training and inference
  • Software-hardware co-design