Deep Learning Solution Architect at NVIDIA

What you'd actually do

Architect end-to-end solutions focused on LLM pretraining, fine-tuning, high-performance inference, RAG workflows, and agentic inference orchestration using NVIDIA’s hardware and software platforms.

Collaborate with customers to understand their LLM-related business challenges and design tailored solutions aligned with the NVIDIA ecosystem.

Lead LLM training, distributed optimization, and performance tuning to achieve optimal throughput, latency, and memory efficiency.

Design and integrate RAG workflows and agentic inference pipelines into customer systems; provide technical guidance on best practices.

Collaborate with NVIDIA engineering teams to provide feedback and support pre-sales technical activities (workshops, demos).

Skills

Required

Master’s / Ph.D. in Computer Science, Artificial Intelligence, or equivalent experience
4+ years hands-on experience in AI
open-source LLM training
fine-tuning
production inference optimization
mainstream LLM architectures
PyTorch
Hugging Face Transformers
GPU computing
cluster architecture
distributed parallel training/inference for LLMs
agentic inference design
AI agents to solve business challenges
communication skills

Nice to have

NVIDIA’s generative AI ecosystem (TRT-LLM, Megatron-LM, NVIDIA NeMo)
LLM optimization (quantization, KV Cache tuning, memory footprint reduction)
Docker
Kubernetes
multi-GPU parallelism
large-scale GPU cluster management

NVIDIA are seeking dynamic Solution Architects with specialized expertise in training Large Language Models (LLMs), implementing RAG workflows, and agentic inference. You will leverage the full NVIDIA software & hardware ecosystem to design, optimize, and deliver production-grade generative AI solutions for enterprise customers. With competitive salaries and a generous benefits package, we are widely considered to be one of the world’s most desirable employers! We have some of the most forward-thinking and hardworking people in the world working for us and, due to outstanding growth, our best-in-class engineering teams are rapidly growing. If you're a creative and autonomous person with a real passion for technology, we want to hear from you.

What You Will Be Doing:

Architect end-to-end solutions focused on LLM pretraining, fine-tuning, high-performance inference, RAG workflows, and agentic inference orchestration using NVIDIA’s hardware and software platforms.
Collaborate with customers to understand their LLM-related business challenges and design tailored solutions aligned with the NVIDIA ecosystem.
Lead LLM training, distributed optimization, and performance tuning to achieve optimal throughput, latency, and memory efficiency.
Design and integrate RAG workflows and agentic inference pipelines into customer systems; provide technical guidance on best practices.
Collaborate with NVIDIA engineering teams to provide feedback and support pre-sales technical activities (workshops, demos).

What We Need to See:

Master’s / Ph.D. in Computer Science, Artificial Intelligence, or equivalent experience.
4+ years hands-on experience in AI, focusing on open-source LLM training, fine-tuning, and production inference optimization.
Deep understanding of mainstream LLM architectures and proficiency in LLM customization via PyTorch, Hugging Face Transformers.
Solid knowledge of GPU computing, cluster architecture, and distributed parallel training/inference for LLMs.
Competency in agentic inference design and using AI agents to solve business challenges.
Strong communication skills, able to articulate complex technical concepts to technical and non-technical stakeholders.

Ways to Stand Out from the Crowd:

Hands-on experience with NVIDIA’s generative AI ecosystem (TRT-LLM, Megatron-LM, NVIDIA NeMo).
Advanced skills in LLM optimization (quantization, KV Cache tuning, memory footprint reduction).
Experience with Docker, Kubernetes for containerized LLM and agent workflow deployment on-prem.
In-depth knowledge of multi-GPU parallelism and large-scale GPU cluster management.

#deeplearning