Solutions Architect - Top AI Labs

NVIDIA NVIDIA · Semiconductors · Beijing, China

Solutions Architect role at NVIDIA focusing on optimizing LLM inference and training acceleration, contributing to open-source frameworks like SGLang and vLLM, and developing KV cache offloading. Requires strong programming, systems fundamentals, and experience in performance analysis.

What you'd actually do

  1. Contribute to the development of open-source inference frameworks such as SGLang and vLLM, including feature and operator development, performance optimization, and model support, in collaboration with the community.
  2. Develop and optimize KV cache offloading frameworks for LLM workloads, supporting multi-level cache offloading and reuse across CPU, SSD, and remote storage to improve inference efficiency. (Team project: FlexKV)
  3. Drive R&D on compute performance in distributed training, and explore methods and technologies for performance optimization.
  4. Study computational challenges in machine learning systems, identify common needs and bottlenecks, and build example code, acceleration libraries, or frameworks accordingly.

Skills

Required

  • master’s degree or above in computer science, mathematics, electrical engineering, automation, or related fields
  • Solid programming skills
  • good understanding of data structures and computer systems fundamentals
  • Strong learning agility, adaptability, and the ability to analyze, define, and independently explore technical problems

Nice to have

  • Familiarity with heterogeneous computing, distributed training, parallel computing, or other areas related to high-performance computing.
  • Experience in performance analysis, performance modeling, or performance optimization
  • contributions to open-source frameworks are a plus
  • Strong ability to define new problems and explore solutions
  • independent PhD-level research experience are preferred
  • Proficiency with AI coding tools

What the JD emphasized

  • Over 10 years working experience in the technology industry
  • independent PhD-level research experience

Other signals

  • LLM inference and training acceleration
  • performance optimization
  • open-source frameworks