Senior Machine Learning Engineer - Physical AI and Synthetic Data Generation

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

NVIDIA is seeking a Senior Machine Learning Engineer to join their Physical AI team. The role focuses on architecting and developing generative pipelines for high-fidelity synthetic data using multimodal and diffusion models. Responsibilities include building and fine-tuning large-scale models, applying user controls for data synthesis, establishing quality assurance pipelines, and leading the generation of massive training datasets. The role requires deep technical knowledge in image/video synthesis, strong programming skills, and experience in assessing synthetic data impact on model performance.

What you'd actually do

  1. Architect Generative Pipelines**:** Develop and implement advanced image and video generation/editing/reasoning models to produce high-fidelity synthetic data for Physical AI applications.
  2. Multimodal Development: Build and fine-tune large-scale models, including VLMs, MLLMs, Generation models, applying transformer, auto-regressive and diffusion-based architectures.
  3. Controllable Synthesis: Apply and evolve user controls during data generation to ensure precise environmental and structural control over generated data.
  4. Detailed Validation: Establish a strong mentality for KPI evaluation and validation to ensure the quality and physical accuracy of the synthetic releases.
  5. Automated Quality Assurance : Build and test automated data QA pipeline using a mix of well known classical computer vision algorithms, and VLMs.

Skills

Required

  • BS, MS, or PhD in Computer Science, Computer Graphics, Robotics, or a related field (or equivalent experience).
  • 12+ years of experience in ML software development.
  • Deep technical knowledge of image/video synthesis, including diffusion models and state-of-the-art multimodal methods.
  • Strong hands-on skills in major DNN libraries and computer languages including Python among others.
  • Various hands on experience with workflow management and database to facilitate large scale training and data generation.
  • Strong analytical and mathematical skills to bridge the gap between data-driven approaches and physical world constraints.
  • Experience in assessing the impact of synthetic data on model performance through metrics and systematic validation.

Nice to have

  • Experience with computer/GPU architecture to improve the performance during inference/training.
  • Familiarity with simulation platforms and deep understanding of 3D sensor modalities (Camera, Multi cameras, Lidar, Radar).
  • Experience with open source software.
  • Strong skills to optimize code efficiency is a huge plus.

What the JD emphasized

  • Deep technical knowledge of image/video synthesis, including diffusion models and state-of-the-art multimodal methods.
  • Experience in assessing the impact of synthetic data on model performance through metrics and systematic validation.

Other signals

  • synthetic data generation
  • multimodal models
  • diffusion models
  • generative pipelines
  • physical AI