Solution Architecture Intern, AI in Industry - 2026

NVIDIA NVIDIA · Semiconductors · Beijing, China +2

NVIDIA is seeking an AI in Industry Solution Architecture Intern to help optimize large models, develop AI workflows, and deliver advanced AI solutions. The intern will provide technical support, design and implement optimizations for AI models, and set up model training or inference to identify and resolve bottlenecks. This role involves working with various AI models and inference frameworks, conducting research, and collaborating with global teams.

What you'd actually do

  1. Provide technical support to internal developers and external customers, facilitating the adoption and implementation of NVIDIA technologies and products.
  2. Apply your experience and knowledge in areas of accelerated computing and machine learning. Design and implement optimization of various AI models or business scenarios.
  3. Setup model training or inference, identify the bottlenecks and verify the ways to improve model efficiency. Conduct surveys and experiments on learning models and to consolidate guidelines and relevant papers.

Skills

Required

  • Bachelor or Master in Computer Science, AI, or a related field; Or candidates pursuing a PhD in ML Infra or data systems for ML.
  • Can work under Linux, with strong programming skills in Python or C++.
  • Familiarity with AI models, including language models, video models, multi-modality models, or domain-specific models.
  • Proficiency in at least one inference framework(e.g. TensorRT/TRT-LLM, ONNX Runtime, PyTorch, vLLM, SGLang, Dynamo).
  • Excellent problem-solving skills and the ability to troubleshoot complex technical issues.
  • Demonstrated ability to collaborate effectively across diverse, global teams, adapting communication styles while maintaining clear, constructive professional interactions.

Nice to have

  • Optimizing critical operators such as GEMM and attention mechanisms tailored to different GPU architectures to improve inference performance.
  • Conducting in-depth research on Speech LLM training and implementing audio classification.
  • Aligning performance with benchmark data to evaluate the accuracy of current modeling, including KV-cache and multi-modality modeling.
  • Familiarity with mainstream inference engines (e.g., vLLM, SGLang), or familiarity with disaggregated LLM Inference.
  • Experience on SOTA RL for reasoning model methods and try to consolidate best practices and relevant papers.

What the JD emphasized

  • optimize large models
  • develop sophisticated AI workflows
  • advanced AI solutions
  • setup model training or inference
  • identify the bottlenecks
  • improve model efficiency
  • surveys and experiments on learning models
  • consolidate guidelines and relevant papers
  • optimize inference performance
  • Speech LLM training
  • audio classification
  • aligning performance with benchmark data
  • evaluate accuracy of current modeling
  • KV-cache
  • multi-modality modeling
  • mainstream inference engines
  • disaggregated LLM Inference
  • SOTA RL for reasoning model methods
  • consolidate best practices and relevant papers

Other signals

  • optimize large models
  • develop sophisticated AI workflows
  • advanced AI solutions
  • setup model training or inference
  • identify bottlenecks
  • improve model efficiency
  • surveys and experiments on learning models
  • consolidate guidelines and relevant papers
  • optimize inference performance
  • Speech LLM training
  • audio classification
  • aligning performance with benchmark data
  • evaluate accuracy of current modeling
  • KV-cache
  • multi-modality modeling
  • mainstream inference engines
  • disaggregated LLM Inference
  • SOTA RL for reasoning model methods
  • consolidate best practices and relevant papers