Senior Research Engineer - Autonomous Vehicles

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

Senior Research Engineer at NVIDIA focusing on AI for Autonomous Vehicles. The role involves developing large-scale training frameworks for multimodal foundation models, optimizing GPU utilization, implementing data loaders, building simulation infrastructure, integrating new architectures, developing sim-to-real pipelines, combining LLMs with policy learning, and applying RL for fine-tuning LLMs. Requires expertise in deep learning, reinforcement learning, generative modeling, distributed training systems, and GPU acceleration.

What you'd actually do

  1. Develop large-scale supervised learning and reinforcement learning training frameworks to support multi-modal foundation models for AVs capable of running on thousands of GPUs;
  2. Optimize GPU and cluster utilization for efficient model training and fine-tuning on massive datasets;
  3. Implement scalable data loaders and preprocessors tailored for multimodal datasets, including videos, text, and sensor data;
  4. Build and optimize simulation infrastructure (based on GPU-accelerated simulators) to support the training of driving policies for AVs at scale;
  5. Collaborate with researchers to integrate cutting-edge model architectures into scalable training pipelines.

Skills

Required

  • Deep learning
  • Reinforcement learning
  • Generative modeling
  • Software engineering
  • Large-scale model training
  • Distributed training systems
  • PyTorch, JAX, or TensorFlow
  • PPO, SAC, or Q-learning
  • Reward shaping, domain randomization, curriculum learning
  • GPU acceleration
  • CUDA programming
  • Kubernetes
  • Python
  • C++
  • HPC environments
  • Job scheduling/orchestration tools

Nice to have

  • multimodal foundation models for AVs
  • sim-to-real transfer pipelines
  • LLMs with policy learning
  • fine-tuning multimodal LLMs
  • SLURM

What the JD emphasized

  • strong expertise in software engineering and in artificial intelligence topics
  • strong programming skills
  • solid track record of training deep learning models at scale
  • good mathematical foundation to analyze new AI algorithms
  • AI models for autonomous driving such as agent behavior models, end-to-end AV architectures, AI safety, closed-loop training approaches, and AV foundation models (VLMs, reasoning models, etc.)
  • publishing at top venues
  • working with the broader scientific community
  • Communicating with different teams and domain scientists in different areas is essential
  • aid fundamental research with the freedom and bandwidth to conduct ground-breaking publishable research
  • impact products and collaborate with teams that focus on AI products
  • Develop large-scale supervised learning and reinforcement learning training frameworks
  • Optimize GPU and cluster utilization for efficient model training and fine-tuning
  • Implement scalable data loaders and preprocessors tailored for multimodal datasets
  • Build and optimize simulation infrastructure
  • Collaborate with researchers to integrate cutting-edge model architectures into scalable training pipelines
  • Develop sim-to-real transfer pipelines
  • Apply reinforcement learning to finetune multimodal LLMs
  • Develop robust monitoring and debugging tools
  • 10+ years of full-time industry experience in large-scale MLOps and AI infrastructure
  • Proven experience designing and optimizing distributed training systems
  • Deep familiarity with reinforcement learning algorithms
  • Deep understanding of GPU acceleration, CUDA programming, and cluster management tools
  • Strong programming skills in Python and a high-performance language such as C++
  • Strong experience with large-scale GPU clusters, HPC environments, and job scheduling/orchestration tools

Other signals

  • develop large-scale supervised learning and reinforcement learning training frameworks
  • optimize GPU and cluster utilization for efficient model training and fine-tuning
  • implement scalable data loaders and preprocessors tailored for multimodal datasets
  • build and optimize simulation infrastructure
  • collaborate with researchers to integrate cutting-edge model architectures into scalable training pipelines
  • develop sim-to-real transfer pipelines
  • propose scalable solutions that combine LLMs with policy learning
  • apply reinforcement learning to finetune multimodal LLMs
  • develop robust monitoring and debugging tools