What you'd actually do

Design and execute high-throughput strategies to capture high-quality multi-view video and image data from thousands of unique participants and environments.

Optimize specialized acquisition hardware and optical configurations to extract high-precision ground truth visual data for complex foreground subjects and environmental backgrounds.

Research and implement methods to fine-tune generative video foundation models on proprietary datasets to produce high-fidelity synthetic video training data, dramatically increasing model exposure to scenes.

Develop automated pipelines that generate high-resolution depth, segmentation, and motion labels from production-grade models to supervise and train next-generation research architectures.

Create rigorous image and video evaluation datasets specifically designed to measure and solve "long-tail" quality issues, such as material properties and temporal stability.

Skills

Required

Python
C++
visual data acquisition and curation for 3D vision tasks
designing and training neural networks
transformers
diffusion models

Nice to have

generative video research
fine-tuning foundation vision models
distillation of foundation vision models
novel view synthesis
computational photography
specialized sensor calibration
active illumination
multi-modal sensor fusion
distributed training frameworks
machine learning data infrastructure
managing end-to-end visual data pipelines

What the JD emphasized

PhD or equivalent practical experience in computer vision, machine learning, computer graphics, or generative media.

Experience with visual data acquisition and curation for 3D vision tasks, such as multi-view video, point clouds, or radiance fields.

Experience designing and training neural networks, specifically with transformers or diffusion models.

Experience with generative video research, including fine-tuning or distillation of foundation vision models for novel view synthesis.

Proven track record of managing end-to-end visual data pipelines, from initial capture strategy to automated curation and model integration.

Other signals

design and execute high-throughput strategies to capture high-quality multi-view video and image data

optimize specialized acquisition hardware and optical configurations

research and implement methods to fine-tune generative video foundation models on proprietary datasets

develop automated pipelines that generate high-resolution depth, segmentation, and motion labels

create rigorous image and video evaluation datasets

As an organization, Google maintains a portfolio of research projects driven by fundamental research, new product innovation, product contribution and infrastructure goals, while providing individuals and teams the freedom to emphasize specific types of work. As a Research Scientist, you'll setup large-scale tests and deploy promising ideas quickly and broadly, managing deadlines and deliverables while applying the latest theories to develop new and improved products, processes, or technologies. From creating experiments and prototyping implementations to designing new architectures, our research scientists work on real-world problems that span the breadth of computer science, such as machine (and deep) learning, data mining, natural language processing, hardware and software performance analysis, improving compilers for mobile platforms, as well as core search and much more.

As a Research Scientist, you'll also actively contribute to the wider research community by sharing and publishing your findings, with ideas inspired by internal projects as well as from collaborations with research programs at partner universities and technical institutes all over the world.

Labs is a group focused on incubating early-stage efforts in support of Google’s mission to organize the world’s information and make it universally accessible and useful. Our team exists to help discover and create new ways to advance our core products through exploration and the application of new technologies. We work to build new solutions that have the potential to transform how users interact with Google. Our goal is to drive innovation by developing new Google products and capabilities that deliver significant impact over longer timeframes.

Individual pay is determined by factors including job-related skills, experience, and relevant education or training.

US: $147000 - $211000 (USD) + 15% bonus target + bonus + equity + benefits

Learn more about benefits at Google.

Responsibilities

Design and execute high-throughput strategies to capture high-quality multi-view video and image data from thousands of unique participants and environments.
Optimize specialized acquisition hardware and optical configurations to extract high-precision ground truth visual data for complex foreground subjects and environmental backgrounds.
Research and implement methods to fine-tune generative video foundation models on proprietary datasets to produce high-fidelity synthetic video training data, dramatically increasing model exposure to scenes.
Develop automated pipelines that generate high-resolution depth, segmentation, and motion labels from production-grade models to supervise and train next-generation research architectures.
Create rigorous image and video evaluation datasets specifically designed to measure and solve "long-tail" quality issues, such as material properties and temporal stability.

Qualifications

Minimum qualifications:

PhD or equivalent practical experience in computer vision, machine learning, computer graphics, or generative media.
Experience in Python and C++.
Experience with visual data acquisition and curation for 3D vision tasks, such as multi-view video, point clouds, or radiance fields.
Experience designing and training neural networks, specifically with transformers or diffusion models.

Preferred qualifications:

Experience with generative video research, including fine-tuning or distillation of foundation vision models for novel view synthesis.
Expertise in computational photography or specialized sensor calibration, such as active illumination or multi-modal sensor fusion.
Familiarity with distributed training frameworks and machine learning data infrastructure.
Proven track record of managing end-to-end visual data pipelines, from initial capture strategy to automated curation and model integration.