Senior Software Engineer - AI Compute Infrastructure

ByteDance ByteDance · Big Tech · Seattle, WA · Infrastructure

Senior Software Engineer to design and build large-scale, container-based cluster management and orchestration systems for LLM inference, focusing on performance, scalability, and cost-efficiency. The role involves architecting GPU and AI accelerator infrastructure, collaborating on inference solutions using various LLM engines, and staying current with AI/ML infrastructure advancements.

What you'd actually do

  1. Design and build large-scale, container-based cluster management and orchestration systems with extreme performance, scalability, and resilience.
  2. Architect next-generation cloud-native GPU and AI accelerator infrastructure to deliver cost-efficient and secure ML platforms.
  3. Collaborate across teams to deliver world-class inference solutions using vLLM, SGLang, TensorRT-LLM, and other LLM engines.
  4. Stay current with the latest advances in open source (Kubernetes, Ray, etc.), AI/ML and LLM infrastructure, and systems research; integrate best practices into production systems.
  5. Write high-quality, production-ready code that is maintainable, testable, and scalable.

Skills

Required

  • B.S./M.S. in Computer Science, Computer Engineering, or related fields with 3+ years of relevant experience
  • Strong understanding of large model inference
  • distributed and parallel systems
  • high-performance networking systems
  • Hands-on experience building cloud or ML infrastructure
  • resource management
  • scheduling
  • request routing
  • monitoring
  • orchestration
  • Solid knowledge of container and orchestration technologies (Docker, Kubernetes)
  • Proficiency in at least one major programming language (Go, Rust, Python, or C++)

Nice to have

  • Experience contributing to or operating large-scale cluster management systems (e.g., Kubernetes, Ray)
  • Experience with workload scheduling
  • GPU orchestration
  • scaling
  • isolation in production environments
  • Hands-on experience with GPU programming (CUDA)
  • inference engines (vLLM, SGLang, TensorRT-LLM)
  • Familiarity with public cloud providers (AWS, Azure, GCP)
  • their ML platforms (SageMaker, Azure ML, Vertex AI)
  • Strong knowledge of ML systems (Ray, DeepSpeed, PyTorch)
  • distributed training/inference platforms
  • Excellent communication skills
  • ability to collaborate across global, cross-functional teams
  • Passion for system efficiency
  • performance optimization
  • open-source innovation

What the JD emphasized

  • large-scale LLM inference
  • GPU-optimized orchestration systems
  • Kubernetes-native control plane
  • extreme performance, scalability, and resilience
  • cost-efficient and secure ML platforms
  • world-class inference solutions
  • LLM engines

Other signals

  • LLM inference infrastructure
  • Kubernetes-native control plane
  • GPU-optimized orchestration systems
  • large-scale LLM inference
  • vLLM, SGLang, TensorRT-LLM