Nim Solutions Architect

NVIDIA NVIDIA · Semiconductors · Beijing, China +3

This role focuses on deploying and optimizing large models using NVIDIA's Inference Microservice (NIM) and related tools. The Solutions Architect will package optimized models (LLM, VLM, etc.) into containers for deployment, refine NIM tools for the community, and design/implement agentic AI solutions for customer scenarios. The role requires strong programming skills, experience with inference engines, and MLOps practices, with a focus on performance engineering and model optimization.

What you'd actually do

  1. Drive the implementation and deployment of NVIDIA Inference Microservice (NIM) solutions
  2. Use NVIDIA NIM Tools or Factory Pipeline to package optimized models (including LLM, VLM, Diffusion, Retriever, CV, OCR, AI4Science etc.) into containers providing standardized API access for on-prem or cloud deployment
  3. Refine NIM tools for the community, help the community to build their performant NIMs
  4. Design and implement agentic AI tailored to customer business scenarios using NIMs
  5. Deliver technical projects, demos and client support tasks as directed by the Solution Architecture Leadership

Skills

Required

  • 3+ years working experience with Bachelor's or Master's degree in Computer Science, Artificial Intelligence, or a related field
  • Proven experience in deploying and optimizing large language models
  • Familiarity with main stream inference engines or inference framework (e.g., SGLang, vLLM, TensorRT, or ONNX Runtime, PyTorch)
  • Strong programming skills in Python or C++
  • Experience with DevOps/MLOps such as Docker, Git, and CI/CD practices
  • Excellent problem-solving skills and ability to troubleshoot complex technical issues
  • Demonstrated ability to collaborate effectively across diverse, global teams, adapting communication styles while maintaining clear, constructive professional interactions

Nice to have

  • Experience in AI performance engineering
  • Expertise in model optimization techniques
  • Knowledge of AI workflow design and implementation; experience on cluster resource management tools
  • CUDA optimization experience, extensive experience designing and deploying large scale HPC and enterprise computing systems

What the JD emphasized

  • Proven experience in deploying and optimizing large language models
  • Familiarity with main stream inference engines or inference framework (e.g., SGLang, vLLM, TensorRT, or ONNX Runtime, PyTorch)
  • Experience in AI performance engineering
  • Expertise in model optimization techniques
  • Knowledge of AI workflow design and implementation

Other signals

  • Deploying and optimizing large language models
  • NVIDIA Inference Microservice (NIM) solutions
  • packaging optimized models into containers
  • design and implement agentic AI tailored to customer business scenarios