About the Team The Seed Infrastructures team oversees the distributed training, reinforcement learning framework, high-performance inference, and heterogeneous hardware compilation technologies for AI foundation models.
Responsibilities
- Design, develop, and optimize high-performance inference systems for large-scale LLMs and VLMs, covering inference engines, serving frameworks, and end-to-end deployment pipelines.
- Build state-of-the-art model inference engines through advanced performance optimization techniques such as compiler-level optimizations, parallel computing, graph fusion, efficient CUDA kernel development, low-precision computation, streaming inference, speculative decoding, and high-concurrency request optimization.
- Collaborate closely with other research teams to identify performance bottlenecks, conduct in-depth performance analysis, and optimize large models; contribute to the development of model toolchains and the broader technical ecosystem.
Requirements
Minimum Qualifications:
- Bachelor's degree or above in Computer Science, Electrical Engineering, Software Engineering, or a related field.
- Strong proficiency in C/C++ and Python; solid foundations in algorithms, data structures, and systems programming; familiarity with containerization and server-side debugging.
- Hands-on experience with at least one mainstream machine learning framework (e.g., PyTorch, TensorFlow).
- Experience deploying or optimizing LLM/VLM inference at production scale, with demonstrated impact on latency, throughput, or serving cost.
- Familiarity with GPU architecture and experience optimizing compute-intensive operators (e.g., FlashAttention, GEMM, GEMV, Conv2D).
Preferred Qualifications:
- Experience with large-scale LLM serving infrastructure or equivalent production LLM deployment experience.
- Experience in GPU programming (CUDA/OpenCL) and familiarity with frameworks such as TensorRT, Triton, or CUTLASS.
- Experience in performance modeling, profiling, and optimization, or strong knowledge of CPU/GPU architectures.
- Familiarity with model/data parallelism frameworks for distributed inference.