What you'd actually do

Explore AI/ML infrastructure acceleration, leveraging DPUs, GPUs, and custom hardware to optimize distributed training and inference.

Design and develop DPU network software with a focus on high performance, low latency, and reliability.

Collaborate with hardware teams to build software-hardware co-design solutions for networking and storage acceleration.

Drive end-to-end performance optimization, from OS kernels and drivers to user-space runtime systems.

Contribute to architecture design, technical proposals, and long-term research directions.

Skills

Required

C/C++ development and debugging
Linux systems development
compute, network architecture, and operating systems understanding
software-hardware co-design OR distributed systems OR high-performance networking OR AI/ML systems

Nice to have

Ph.D. in related fields with research training and publications
network virtualization (OVS, SR-IOV, eBPF)
DPDK and high-performance user-space networking
hardware acceleration experience, FPGA/ASIC/GPU/CUDA
NCCL Collectives
AI communication patterns and parallelization techniques
inference kv cache system
data preprocessing system

About the Team The ByteDance DPU (Data Processing Unit) team is building the foundational computing infrastructure for ByteDance and Volcano Engine Public Cloud. Our mission is to advance the architecture, development, and research of next-generation software-hardware technologies across compute, networking, and storage for cloud and AI computing. Our technology stack spans

Cloud virtualization & hypervisors
High-performance user-space network protocols (DPDK, RDMA, etc.)
High-speed interconnect and virtual switching
Distributed storage acceleration
GPU virtualization and scheduling for AI/ML workloads We work at the intersection of software systems, distributed infrastructure, and custom hardware acceleration, shaping the next wave of cloud-scale computing.

Responsibilities

Design and develop DPU network software with a focus on high performance, low latency, and reliability.
Collaborate with hardware teams to build software-hardware co-design solutions for networking and storage acceleration.
Explore AI/ML infrastructure acceleration, leveraging DPUs, GPUs, and custom hardware to optimize distributed training and inference.
Drive end-to-end performance optimization, from OS kernels and drivers to user-space runtime systems.
Contribute to architecture design, technical proposals, and long-term research directions.

Requirements

Minimum Qualifications

B.S./M.S. in Computer Science, Computer Engineering, or related fields; or Ph.D. with strong research/publications.
5+ years of relevant industry experience (exception for Ph.D. with strong background).
Proficiency in C/C++ development and debugging.
Strong Linux systems development experience.
Solid understanding of compute, network architecture, and operating systems.
Background in at least one of: software-hardware co-design, distributed systems, high-performance networking, or AI/ML systems.

Preferred Qualifications

Ph.D. in related fields with research training and publications.
Experience with software-hardware co-design (networking, storage, or distributed compute).
Hands-on experience with network virtualization (OVS, SR-IOV, eBPF).
Familiarity with DPDK and high-performance user-space networking.
Bonus points for hardware acceleration experience, FPGA/ASIC/GPU/CUDA
Bonus points for experience with NCCL Collectives along with AI communication patterns and parallelization techniques
Proven experience designing and building AI/ML infrastructure related but not limited to inference kv cache system, data preprocessing system.

Cloud virtualization & hypervisors
High-performance user-space network protocols (DPDK, RDMA, etc.)
High-speed interconnect and virtual switching
Distributed storage acceleration
GPU virtualization and scheduling for AI/ML workloads We work at the intersection of software systems, distributed infrastructure, and custom hardware acceleration, shaping the next wave of cloud-scale computing.

Responsibilities

Design and develop DPU network software with a focus on high performance, low latency, and reliability.
Collaborate with hardware teams to build software-hardware co-design solutions for networking and storage acceleration.
Explore AI/ML infrastructure acceleration, leveraging DPUs, GPUs, and custom hardware to optimize distributed training and inference.
Drive end-to-end performance optimization, from OS kernels and drivers to user-space runtime systems.
Contribute to architecture design, technical proposals, and long-term research directions.

Requirements

Minimum Qualifications

B.S./M.S. in Computer Science, Computer Engineering, or related fields; or Ph.D. with strong research/publications.
5+ years of relevant industry experience (exception for Ph.D. with strong background).
Proficiency in C/C++ development and debugging.
Strong Linux systems development experience.
Solid understanding of compute, network architecture, and operating systems.
Background in at least one of: software-hardware co-design, distributed systems, high-performance networking, or AI/ML systems.

Preferred Qualifications

Ph.D. in related fields with research training and publications.
Experience with software-hardware co-design (networking, storage, or distributed compute).
Hands-on experience with network virtualization (OVS, SR-IOV, eBPF).
Familiarity with DPDK and high-performance user-space networking.
Bonus points for hardware acceleration experience, FPGA/ASIC/GPU/CUDA
Bonus points for experience with NCCL Collectives along with AI communication patterns and parallelization techniques
Proven experience designing and building AI/ML infrastructure related but not limited to inference kv cache system, data preprocessing system.

Tech Lead, Research Scientist - Dpu & AI Infra

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Requirements

Requirements