Deep Learning Solution Architect

NVIDIA NVIDIA · Semiconductors · Beijing, China +1

NVIDIA is seeking a Deep Learning Solution Architect to design and optimize production-grade generative AI solutions for enterprise customers, focusing on LLM training, RAG, and agentic inference using NVIDIA's ecosystem.

What you'd actually do

  1. Architect end-to-end solutions focused on LLM pretraining, fine-tuning, high-performance inference, RAG workflows, and agentic inference orchestration using NVIDIA’s hardware and software platforms.
  2. Collaborate with customers to understand their LLM-related business challenges and design tailored solutions aligned with the NVIDIA ecosystem.
  3. Lead LLM training, distributed optimization, and performance tuning to achieve optimal throughput, latency, and memory efficiency.
  4. Design and integrate RAG workflows and agentic inference pipelines into customer systems; provide technical guidance on best practices.
  5. Collaborate with NVIDIA engineering teams to provide feedback and support pre-sales technical activities (workshops, demos).

Skills

Required

  • Master’s / Ph.D. in Computer Science, Artificial Intelligence, or equivalent experience
  • 4+ years hands-on experience in AI
  • open-source LLM training
  • fine-tuning
  • production inference optimization
  • mainstream LLM architectures
  • PyTorch
  • Hugging Face Transformers
  • GPU computing
  • cluster architecture
  • distributed parallel training/inference for LLMs
  • agentic inference design
  • AI agents to solve business challenges
  • communication skills

Nice to have

  • NVIDIA’s generative AI ecosystem (TRT-LLM, Megatron-LM, NVIDIA NeMo)
  • LLM optimization (quantization, KV Cache tuning, memory footprint reduction)
  • Docker
  • Kubernetes
  • multi-GPU parallelism
  • large-scale GPU cluster management

What the JD emphasized

  • LLM pretraining
  • fine-tuning
  • production inference optimization
  • agentic inference design

Other signals

  • LLM pretraining
  • fine-tuning
  • high-performance inference
  • RAG workflows
  • agentic inference orchestration