Senior Deep Learning Solution Architect

NVIDIA NVIDIA · Semiconductors · Beijing, China +1

Senior Deep Learning Solution Architect at NVIDIA, focusing on LLM inference and training acceleration, performance optimization, and contributing to open-source frameworks like SGLang and vLLM. The role involves developing and optimizing inference frameworks, KV cache offloading, and exploring distributed training performance.

What you'd actually do

  1. Contribute to the development of open-source inference frameworks such as SGLang and vLLM, including feature and operator development, performance optimization, and model support, in collaboration with the community.
  2. Develop and optimize KV cache offloading frameworks for LLM workloads, supporting multi-level cache offloading and reuse across CPU, SSD, and remote storage to improve inference efficiency. (Team project: FlexKV)
  3. Drive R&D on compute performance in distributed training, and explore methods and technologies for performance optimization.
  4. Study computational challenges in machine learning systems, identify common needs and bottlenecks, and build example code, acceleration libraries, or frameworks accordingly.

Skills

Required

  • Over 5 years working experience in the technology industry
  • master’s degree or above in computer science, mathematics, electrical engineering, automation, or related fields
  • Solid programming skills
  • good understanding of data structures and computer systems fundamentals
  • Strong learning agility, adaptability
  • ability to analyze, define, and independently explore technical problems

Nice to have

  • Familiarity with heterogeneous computing, distributed training, parallel computing, or other areas related to high-performance computing
  • Experience in performance analysis, performance modeling, or performance optimization
  • contributions to open-source frameworks are a plus
  • Strong ability to define new problems and explore solutions
  • independent PhD-level research experience are preferred
  • Proficiency with AI coding tools

What the JD emphasized

  • LLM inference and training acceleration
  • performance optimization
  • open-source inference frameworks

Other signals

  • LLM inference and training acceleration
  • performance optimization
  • open-source frameworks