Senior Product Manager, AI Inference - … at NVIDIA

What you'd actually do

Core Dynamo Architecture: Drive the product strategy for Dynamo’s modular components, including the KV-aware Router, KV Block Manager (KVBM), and communication planes.

Inference Orchestration: Define requirements for sophisticated routing logic that minimizes redundant prefill and optimizes Time to First Token (TTFT) across substantial GPU clusters.

Memory & KV Cache Management: Define strategy for multi-tier KV cache offloading enabling long-context windows and high-concurrency serving without compromising user experience.

Hardware-Software Co-Design: Collaborate with engineering to ensure Dynamo extracts maximum performance from NVIDIA hardware.

Agentic Inference: Develop Agent-first capabilities (e.g. priority, output length, cache pinning) to support sophisticated, multi-turn reasoning.

Skills

Required

Product management
AI inference
distributed systems
GPU-accelerated computing
LLM inference lifecycle
KV cache mechanics
distributed serving techniques
translate low-level technical capabilities into high-level business value
Teamwork and influencing skills
Empathy and deep care for your customers
Pragmatic and data-driven project management skills

Nice to have

Agentic frameworks (LangChain, NeMo Agents)
multi-turn, stateful AI applications
LLMs and Generative AI trends
Responsible AI
MLOps
Technical background and hands-on experience building AI (and LLM) solutions as an engineer
intuition for ML models and systems evaluation
read relevant research papers

What the JD emphasized

Proven experience in AI inference, distributed systems, and GPU-accelerated computing.

Deep understanding of the LLM inference lifecycle (Prefill vs. Decode), KV cache mechanics, and distributed serving techniques, like Disaggregated Serving.

Proven track record working with Agentic frameworks (LangChain, NeMo Agents) or building multi-turn, stateful AI applications.

NVIDIA is seeking a highly technical Product Manager to own the evolution of NVIDIA Dynamo, our flagship distributed inference framework. In this role, you will define the roadmap for high-scale LLM and Generative AI serving, bridging the gap between cutting-edge hardware (Vera Rubin, LPU, and NVLink) and software optimizations, like disaggregated serving, KV aware routing, and intelligent KV cache management. We need a self-starter to continue growing the product portfolio and work with the customers to incorporate model evaluation into end-2-end LLM workflows. We're looking for the rare blend of technical and product skills and passion for groundbreaking technology. If this fits, we would love to learn more about you!

What you'll be doing:

Core Dynamo Architecture: Drive the product strategy for Dynamo’s modular components, including the KV-aware Router, KV Block Manager (KVBM), and communication planes.
Inference Orchestration: Define requirements for sophisticated routing logic that minimizes redundant prefill and optimizes Time to First Token (TTFT) across substantial GPU clusters.
Memory & KV Cache Management: Define strategy for multi-tier KV cache offloading enabling long-context windows and high-concurrency serving without compromising user experience.
Hardware-Software Co-Design: Collaborate with engineering to ensure Dynamo extracts maximum performance from NVIDIA hardware.
Agentic Inference: Develop Agent-first capabilities (e.g. priority, output length, cache pinning) to support sophisticated, multi-turn reasoning.
Ecosystem Integration: Partner with open-source communities, e.g. vLLM, SGLang, TensorRT-LLM, and internal teams (NeMo Agent Toolkit).
Product Leadership: Author product requirements documents (PRDs) and software application designs docs (SADDs). Build for ease-of-use, extensibility, modularity. Work with TPMs to align roadmaps and respond to market trends.

What we need to see:

12+ years demonstrated ability in product management at a technology company, co-founder or related technical role in a startup or equivalent experience.
Bachelors Degree in Computer Science or related field (or equivalent experience).
Proven experience in AI inference, distributed systems, and GPU-accelerated computing.
Deep understanding of the LLM inference lifecycle (Prefill vs. Decode), KV cache mechanics, and distributed serving techniques, like Disaggregated Serving.
Ability to translate low-level technical capabilities into high-level business value (reduced TCO, faster TTFT).
Teamwork and influencing skills to optimally navigate in a highly matrixed environment. At NVIDIA, your entire company is on your team!
Empathy and deep care for your customers to build products people love.
Pragmatic and data-driven project management skills to navigate software development lifecycle requirements, product release schedules, and customer desires and deliver quality software on schedule.

Ways to stand out from the crowd:

Proven track record working with Agentic frameworks (LangChain, NeMo Agents) or building multi-turn, stateful AI applications.
Knowledge of trends around LLMs and Generative AI, Responsible AI, MLOps
Technical background and hands-on experience building AI (and LLM) solutions as an engineer. We expect you to have intuition for ML models and systems evaluation and read relevant research papers to inform your product strategy and roadmap.

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family www.nvidiabenefits.com/

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 208,000 USD - 327,750 USD.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until April 13, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.