Principal Perception Engineer, Obstacle Foundation Models - Autonomous Vehicles

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

Principal Perception Engineer at NVIDIA for Autonomous Vehicles, focusing on designing and productizing next-generation 3D obstacle perception stacks using deep learning, transformers, and multi-modal techniques. The role involves technical leadership, hands-on algorithm development, production-grade model development, data strategy, and collaboration with safety and systems teams for large-scale deployment.

What you'd actually do

  1. Own the technical vision, architecture, and roadmap for 3D obstacle perception to support end-to-end autonomous driving functionalities, leveraging state-of-the-art CNN and transformer-based architectures where appropriate.
  2. Design and develop advanced 3D perception models using multi-camera inputs and/or multi-sensor fusion (camera, radar, lidar) for obstacle detection and tracking, including opportunities to explore BEV and transformer-based 3D perception.
  3. Lead the development of efficient, production-grade deep learning models: define objectives, select architectures, guide experimentation, and establish best practices for training and evaluation, using techniques such as large-scale pretraining, distillation, and parameter-efficient fine-tuning (e.g., LoRA).
  4. Define and drive KPI frameworks to quantify perception performance; analyze large-scale real and synthetic datasets to identify failure modes and systematically improve accuracy, robustness, and efficiency, incorporating modern approaches like self-supervised and representation learning when beneficial.
  5. Lead data strategy for perception: specify data and labeling requirements, prioritize data collection and annotation, and collaborate closely with data and ground-truth teams to maximize impact, including model-assisted workflows (e.g., active learning, auto-labeling, VLMs) and advanced model-in-the-loop tooling.

Skills

Required

  • Python
  • C++
  • PyTorch
  • deep learning
  • perception systems
  • data-driven development
  • technical leadership
  • architecture
  • algorithm development
  • production-grade software development

Nice to have

  • autonomous driving
  • robotics
  • CNNs
  • transformers
  • multi-camera input
  • multi-sensor fusion
  • BEV perception
  • large-scale pretraining
  • distillation
  • parameter-efficient fine-tuning
  • LoRA
  • self-supervised learning
  • representation learning
  • model-assisted workflows
  • active learning
  • auto-labeling
  • VLMs
  • model-in-the-loop tooling
  • embedded platforms
  • real-time platforms
  • optimization for latency
  • memory optimization
  • compute constraints
  • vision-language models
  • 3D computer vision fundamentals
  • camera modeling and calibration
  • multi-view geometry
  • 3D representations
  • transformer-based 3D perception
  • BEV perception pipelines
  • CUDA development
  • GPU-accelerated components
  • custom CUDA kernels
  • publication record

What the JD emphasized

  • 15+ years of hands-on experience developing deep learning–based perception or closely related systems for complex real-world problems
  • track record of taking models from prototype to production
  • Demonstrated technical leadership as a senior or principal-level individual contributor
  • owning features or subsystems end-to-end
  • setting technical direction
  • making architectural decisions
  • coordinating across teams
  • Proven experience in data-driven development
  • close collaboration with data, labeling, and ground-truth teams on data strategy, labeling quality, and iterative model improvement
  • Strong publication record or recognized contributions in deep learning, computer vision, or autonomous systems at leading conferences/journals (e.g., CVPR, ICCV, NeurIPS, IROS)

Other signals

  • leading the design and productization of next-generation autonomous driving perception stack
  • drive cross-functional execution
  • deeply hands-on with architecture, algorithms, and implementation
  • modern transformer-based, multi-modal, and vision-language techniques
  • production-grade deep learning models
  • large-scale pretraining, distillation, and parameter-efficient fine-tuning
  • Define and drive KPI frameworks to quantify perception performance
  • analyze large-scale real and synthetic datasets
  • Lead data strategy for perception
  • model-assisted workflows
  • partner with safety, systems, and software teams
  • stringent product requirements for safety, latency, resource usage, and software robustness
  • deployment at scale