NVIDIA are seeking dynamic Solution Architects with specialized expertise in training Large Language Models (LLMs), implementing RAG workflows, and agentic inference. You will leverage the full NVIDIA software & hardware ecosystem to design, optimize, and deliver production-grade generative AI solutions for enterprise customers. With competitive salaries and a generous benefits package, we are widely considered to be one of the world’s most desirable employers! We have some of the most forward-thinking and hardworking people in the world working for us and, due to outstanding growth, our best-in-class engineering teams are rapidly growing. If you're a creative and autonomous person with a real passion for technology, we want to hear from you.
What You Will Be Doing:
- Architect end-to-end solutions focused on LLM pretraining, fine-tuning, high-performance inference, RAG workflows, and agentic inference orchestration using NVIDIA’s hardware and software platforms.
- Collaborate with customers to understand their LLM-related business challenges and design tailored solutions aligned with the NVIDIA ecosystem.
- Lead LLM training, distributed optimization, and performance tuning to achieve optimal throughput, latency, and memory efficiency.
- Design and integrate RAG workflows and agentic inference pipelines into customer systems; provide technical guidance on best practices.
- Collaborate with NVIDIA engineering teams to provide feedback and support pre-sales technical activities (workshops, demos).
What We Need to See:
- Master’s / Ph.D. in Computer Science, Artificial Intelligence, or equivalent experience.
- 4+ years hands-on experience in AI, focusing on open-source LLM training, fine-tuning, and production inference optimization.
- Deep understanding of mainstream LLM architectures and proficiency in LLM customization via PyTorch, Hugging Face Transformers.
- Solid knowledge of GPU computing, cluster architecture, and distributed parallel training/inference for LLMs.
- Competency in agentic inference design and using AI agents to solve business challenges.
- Strong communication skills, able to articulate complex technical concepts to technical and non-technical stakeholders.
Ways to Stand Out from the Crowd:
- Hands-on experience with NVIDIA’s generative AI ecosystem (TRT-LLM, Megatron-LM, NVIDIA NeMo).
- Advanced skills in LLM optimization (quantization, KV Cache tuning, memory footprint reduction).
- Experience with Docker, Kubernetes for containerized LLM and agent workflow deployment on-prem.
- In-depth knowledge of multi-GPU parallelism and large-scale GPU cluster management.
#deeplearning