Director, System Software Engineering - Metropolis Accelerated and Inferencing Software

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

NVIDIA is seeking a Director of System Software Engineering to lead teams responsible for the full lifecycle of Vision AI strategy, from model onboarding to production deployment. The role focuses on transforming foundation models into real-time, GPU-accelerated video intelligence systems, scaling multimodal reasoning, and enabling agentic development workflows. Key responsibilities include architecting and operationalizing inference acceleration, driving implementations of frameworks like TensorRT and VLLM, collaborating with partners on custom models, and ensuring performance benchmarking. The ideal candidate has extensive experience in deep learning, GPU optimization, and leading engineering teams in embedded and enterprise platforms.

What you'd actually do

  1. Lead, encourage, and develop world-class engineering and data teams decentralized across Europe, Asia, and the United States.
  2. Architect and operationalize NVIDIA’s end-to-end data Inference Acceleration strategy, powering inference and continuous performance improvements.
  3. Drive strategic implementations of TensorRT, VLLM, and other accelerated frameworks for inference solutions for Edge and Enterprise devices: Lead Accelerated Computing efforts and solutions for key Metropolis verticals. Set up Proofs of Readiness (PORs) and guide their implementations.
  4. Collaborate with major Metropolis OEMs and Partners to architect highly accelerated and optimized deep learning models and inference pipelines for their specific requirements.
  5. Performance Benchmarking: Orchestrate efforts to achieve leading performance results on industry benchmarks like MLPerf on various edge and Enterprise devices.

Skills

Required

  • Deep learning
  • GPU optimization
  • Inference
  • Systems engineering
  • Leadership
  • Embedded software
  • Multimodal AI
  • LLMs
  • VLMs

Nice to have

  • PhD in Spatial Computing & Awareness, Sim-to-Real Transfer, Human-to-Physical AI Interaction
  • CV
  • GenAI models
  • Smart Spaces
  • Physical AI

What the JD emphasized

  • hands-on with deep learning
  • deep experience tuning on NVIDIA GPUs
  • consistent track record of delivering robust, low-latency inference at scale
  • technical leadership positions accountable for delivering outstanding production software within a multidimensional setting
  • Deep knowledge of GPU, CPU, and dedicated deep learning architecture fundamentals, and low-level performance optimizations using heterogeneous computing
  • Hands-on experience with VLMs, LLMs, or multimodal AI systems applied to perception, data triage, or automated labeling

Other signals

  • shipping AI products
  • scaling AI systems
  • production deployment
  • inference at scale