Senior/tech Lead Ai/llm Network Software Development Engineer - San Jose

ByteDance ByteDance · Big Tech · San Jose, CA · R&D

This role focuses on designing, implementing, and deploying high-speed network technologies specifically to support AI/LLM applications. Responsibilities include developing platforms for monitoring and diagnosing large-scale AI networks, researching and optimizing AI communication frameworks, network protocols, and host-network-application co-design for scalability and performance, and building next-generation AI network infrastructure.

What you'd actually do

  1. Design, implementation and deployment of high-speed network technologies to support AI/LLM applications.
  2. Design and development of platforms/systems for monitoring, analysis and diagnosis of large scale AI/LLM network.
  3. Research and development of high-performance AI communication framework, network protocol stacks, and codesign optimization of host-network-application to improve the scalability, reliability and performance of AI/LLM network.
  4. Building next generation AI network infrastructure supporting large scale heterogeneous network hardware with innovative and deployable solutions.

Skills

Required

  • computer network
  • network programming
  • C/C++
  • Python
  • Go
  • high-speed network systems
  • RDMA
  • congestion control
  • AI network optimization

Nice to have

  • developing high performance communication frameworks(including NCCL, MPI and RPC libraries)
  • developing software systems for AI network diagnosis and performance optimization

Other signals

  • AI/LLM applications
  • AI/LLM network
  • AI communication framework
  • AI network infrastructure