Solution Architect – Accelerated Computing Libraries

NVIDIA NVIDIA · Semiconductors · Beijing, China +1

NVIDIA is seeking a Solution Architect to drive the adoption of their AI and accelerated computing libraries across industries. The role involves understanding customer workloads, designing solutions using NVIDIA libraries for LLM inference and training acceleration, and collaborating with product teams to improve features and performance. The candidate will also build technical assets and analyze industry trends.

What you'd actually do

  1. Drive the adoption of key NVIDIA AI and accelerated computing libraries across multiple industries by working closely with customers’ technical teams, local field teams, and global product teams.
  2. Deeply understand customers’ workloads and requirements, and map them to NVIDIA libraries, identifying functional, performance, and usability gaps.
  3. Design and validate solutions using NVIDIA libraries (e.g., for inference, training, data processing, and simulation), including PoCs, benchmarks, and best-practice reference designs.
  4. Collaborate with NVIDIA product and engineering teams to prioritize and close key gaps through feature requests, performance tuning, and roadmap feedback, turning customer needs into concrete product improvements.
  5. Build and maintain technical assets (sample code, reference implementations, design guides, internal playbooks) that help scale NVIDIA libraries to more customers and use cases.

Skills

Required

  • 5+ years of experience in the technology industry in roles such as solutions architect, systems engineer, ML engineer, or software engineer
  • master’s degree or above in computer science, mathematics, electrical engineering, automation, or related fields
  • Solid programming skills (such as Python/C/C++)
  • good grasp of data structures, algorithms, and computer systems fundamentals
  • Experience working directly with external or internal customers to understand requirements, design solutions, and drive technical adoption
  • Strong ability to analyze and define problems, quickly learn new technologies, and independently explore and validate solution options
  • Excellent communication skills
  • Proficiency in written and spoken English and Chinese

Nice to have

  • Hands-on experience with NVIDIA GPUs, CUDA, and one or more NVIDIA libraries (e.g., MCore, Dynamo, CUTLASS, NCCL) or other AI/HPC frameworks.
  • Background in high-performance computing, distributed training/inference, or large-scale system performance optimization.
  • Experience conducting performance analysis and benchmarking, defining meaningful KPIs, and driving performance tuning across hardware, libraries, and applications.
  • Contributions to open-source projects, technical blogs, or public talks in AI, ML systems, or performance optimization.
  • Demonstrated ability to define new problem spaces, propose end-to-end solution architectures, and influence cross-functional teams without direct authority.

What the JD emphasized

  • LLM inference and training acceleration

Other signals

  • customer-facing
  • technical leadership
  • performance optimization