Research Scientist, Visual Data and Generative Research

Google Google · Big Tech · San Francisco, CA +2

Research Scientist focused on visual data and generative models, involving data acquisition, fine-tuning foundation models for synthetic data generation, and developing automated pipelines for labeling and evaluation datasets. The role emphasizes research in computer vision and machine learning, with a goal of improving generative media and training next-generation architectures.

What you'd actually do

  1. Design and execute high-throughput strategies to capture high-quality multi-view video and image data from thousands of unique participants and environments.
  2. Optimize specialized acquisition hardware and optical configurations to extract high-precision ground truth visual data for complex foreground subjects and environmental backgrounds.
  3. Research and implement methods to fine-tune generative video foundation models on proprietary datasets to produce high-fidelity synthetic video training data, dramatically increasing model exposure to scenes.
  4. Develop automated pipelines that generate high-resolution depth, segmentation, and motion labels from production-grade models to supervise and train next-generation research architectures.
  5. Create rigorous image and video evaluation datasets specifically designed to measure and solve "long-tail" quality issues, such as material properties and temporal stability.

Skills

Required

  • Python
  • C++
  • visual data acquisition and curation for 3D vision tasks
  • designing and training neural networks
  • transformers
  • diffusion models

Nice to have

  • generative video research
  • fine-tuning foundation vision models
  • distillation of foundation vision models
  • novel view synthesis
  • computational photography
  • specialized sensor calibration
  • active illumination
  • multi-modal sensor fusion
  • distributed training frameworks
  • machine learning data infrastructure
  • managing end-to-end visual data pipelines

What the JD emphasized

  • PhD or equivalent practical experience in computer vision, machine learning, computer graphics, or generative media.
  • Experience with visual data acquisition and curation for 3D vision tasks, such as multi-view video, point clouds, or radiance fields.
  • Experience designing and training neural networks, specifically with transformers or diffusion models.
  • Experience with generative video research, including fine-tuning or distillation of foundation vision models for novel view synthesis.
  • Proven track record of managing end-to-end visual data pipelines, from initial capture strategy to automated curation and model integration.

Other signals

  • design and execute high-throughput strategies to capture high-quality multi-view video and image data
  • optimize specialized acquisition hardware and optical configurations
  • research and implement methods to fine-tune generative video foundation models on proprietary datasets
  • develop automated pipelines that generate high-resolution depth, segmentation, and motion labels
  • create rigorous image and video evaluation datasets