Senior Manager, Engineering - Enterprise AI and Automation

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

Senior Engineering Manager to lead the strategy and execution for NVIDIA’s agentic developer platform, focusing on building, evaluating, and improving autonomous agents. The role involves identifying gaps, driving POCs, operationalizing approaches into reusable components, and establishing governance and safety mechanisms to scale autonomous systems within NVIDIA.

What you'd actually do

  1. Track and deeply understand evolving agent development patterns across NVIDIA and the broader ecosystem
  2. Identify gaps and friction in current agent architectures, and translate insights into a platform strategy that boosts developer velocity and agent quality—backed by evaluations, benchmarking, and feedback loops
  3. Assess and integrate open source and third-party tools where they add leverage; drive clear build-vs-use decisions
  4. Architect and integrate high-performance data pipelines, RAG systems, vector databases, and GPU-optimized training and inference workflows.
  5. Lead integration of the AI Data Platform into NVIDIA’s on-prem AI Factory, optimizing GPU-to-storage throughput, data locality, and distributed inference performance.

Skills

Required

  • Bachelor’s degree in CS/Engineering or equivalent experience
  • 10+ overall years in software engineering, including 4+ years managing high-performing teams
  • Strong hands-on experience with evolving agent architectures and open-source libraries; deep expertise in LLM/agent architectures—leading POCs and integrating them into real business use cases with measurable adoption/impact
  • Ability to turn fast-moving, ambiguous problem spaces into clear platform strategy, roadmap, and outcomes
  • Proven track record building multi-team developer platforms (APIs/SDKs, reusable components, reference implementations)
  • Experience building evaluation/benchmarking systems for agent workflows (metrics, regression, feedback loops)
  • Strong judgment integrating OSS/3P tools; clear build-vs-use decision-making and integration strategy
  • Product approach for safety and governance: controls, audit ability, monitoring, and risk management
  • Strong leadership and executive communication (engineering, product, security, research)

Nice to have

  • Experience implementing enterprise-grade governance for agent systems (controls, audit-ability, monitoring, policy enforcement) in production autonomous workflows
  • Demonstrated wins taking new/open-source agent constructs from POC to production adoption, with clear business impact (cycle time, quality, cost, reliability)
  • Built and scaled an agent platform or agent developer experience used by multiple teams (SDKs, templates, reference apps, reusable building blocks)
  • Clear point of view and real examples on build-vs-use decisions—when to adopt OSS/3P vs build internal primitives—and how to operationalize the choice
  • Deep experience with agent evaluation at scale (long-horizon tasks, tool correctness, reliability testing, automated regressions, offline/online feedback loops)

What the JD emphasized

  • deeply understanding how teams across the company build, evaluate, and improve autonomous agents
  • turning those evolving patterns into scalable platform capabilities
  • drive rapid proof-of-concepts on emerging agent constructs and ecosystem tools
  • operationalize the best approaches into reusable building blocks, integrations, and governance mechanisms
  • build a platform
  • platform helps teams safely ship more autonomous systems at NVIDIA scale
  • What we need to see:
  • Strong hands-on experience with evolving agent architectures and open-source libraries; deep expertise in LLM/agent architectures—leading POCs and integrating them into real business use cases with measurable adoption/impact
  • Ability to turn fast-moving, ambiguous problem spaces into clear platform strategy, roadmap, and outcomes
  • Proven track record building multi-team developer platforms (APIs/SDKs, reusable components, reference implementations)
  • Experience building evaluation/benchmarking systems for agent workflows (metrics, regression, feedback loops)
  • Strong judgment integrating OSS/3P tools; clear build-vs-use decision-making and integration strategy
  • Product approach for safety and governance: controls, audit ability, monitoring, and risk management
  • Ways to stand out from the crowd:
  • Experience implementing enterprise-grade governance for agent systems (controls, audit-ability, monitoring, policy enforcement) in production autonomous workflows
  • Demonstrated wins taking new/open-source agent constructs from POC to production adoption, with clear business impact (cycle time, quality, cost, reliability)
  • Built and scaled an agent platform or agent developer experience used by multiple teams (SDKs, templates, reference apps, reusable building blocks)
  • Clear point of view and real examples on build-vs-use decisions—when to adopt OSS/3P vs build internal primitives—and how to operationalize the choice
  • Deep experience with agent evaluation at scale (long-horizon tasks, tool correctness, reliability testing, automated regressions, offline/online feedback loops)

Other signals

  • leading strategy and execution for NVIDIA’s agentic developer platform
  • operationalize the best approaches into reusable building blocks, integrations, and governance mechanisms
  • platform helps teams safely ship more autonomous systems at NVIDIA scale