Staff Software Engineer, ML Data Infras… at Google

What you'd actually do

Enable next-generation model architectures and training procedures.

Write and maintain large-scale data processing pipelines in C++.

Propose and secure buy-in from our clients to build new infrastructure for the evolving training data use-cases.

Reduce complexity and fragmentation in the ML training infrastructure by providing standardized, composable, and self-service infrastructure solutions.

Collaborate closely with other infrastructure teams working on recommendations quality, storage, logging and privacy. Debug data quality and infrastructure issues across the stack.

Skills

Required

Bachelor's degree or equivalent practical experience
8 years of experience programming in C++
5 years of experience testing, and launching software products
5 years of experience building and developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage, or hardware architecture
3 years of experience with software design and architecture

Nice to have

Experience building large-scale data infrastructure, frameworks or libraries
Understanding of ML concepts, including model architecture and training
Ability to collaborate effectively across teams and functions
Solid communication (broadly and deeply) skills about recommendation technology, system design and implementation

What the JD emphasized

8 years of experience programming in C++

5 years of experience testing, and launching software products

5 years of experience building and developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage, or hardware architecture

3 years of experience with software design and architecture

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

The YouTube Discovery Data team is responsible for the data that powers personalized discovery at YouTube -- the YouTube homepage, watch page, and dozens of other surfaces that allow users to discover content on YouTube. Hundreds of engineers across YouTube use these data sources to train and serve more than a thousand ML models, including the use of LLMs for personalized discovery at YouTube scale. Some of our current products include YouTube watch history, discovery training data, discovery sessions, the YouTube user data dump.

At YouTube, we believe that everyone deserves to have a voice, and that the world is a better place when we listen, share, and build community through our stories. We work together to give everyone the power to share their story, explore what they love, and connect with one another in the process. Working at the intersection of cutting-edge technology and boundless creativity, we move at the speed of culture with a shared goal to show people the world. We explore new ideas, solve real problems, and have fun — and we do it all together.

The US base salary range for this full-time position is $207,000-$300,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.

Responsibilities

Enable next-generation model architectures and training procedures.
Write and maintain large-scale data processing pipelines in C++.
Propose and secure buy-in from our clients to build new infrastructure for the evolving training data use-cases.
Reduce complexity and fragmentation in the ML training infrastructure by providing standardized, composable, and self-service infrastructure solutions.
Collaborate closely with other infrastructure teams working on recommendations quality, storage, logging and privacy. Debug data quality and infrastructure issues across the stack.

Qualifications

Minimum qualifications:

Bachelor's degree or equivalent practical experience.
8 years of experience programming in C++.
5 years of experience testing, and launching software products.
5 years of experience building and developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage, or hardware architecture.
3 years of experience with software design and architecture.

Preferred qualifications:

Experience building large-scale data infrastructure, frameworks or libraries.
Understanding of ML concepts, including model architecture and training.
Ability to collaborate effectively across teams and functions.
Solid communication (broadly and deeply) skills about recommendation technology, system design and implementation.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.

Responsibilities

Enable next-generation model architectures and training procedures.
Write and maintain large-scale data processing pipelines in C++.
Propose and secure buy-in from our clients to build new infrastructure for the evolving training data use-cases.
Reduce complexity and fragmentation in the ML training infrastructure by providing standardized, composable, and self-service infrastructure solutions.
Collaborate closely with other infrastructure teams working on recommendations quality, storage, logging and privacy. Debug data quality and infrastructure issues across the stack.

Qualifications

Minimum qualifications:

Bachelor's degree or equivalent practical experience.
8 years of experience programming in C++.
5 years of experience testing, and launching software products.
5 years of experience building and developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage, or hardware architecture.
3 years of experience with software design and architecture.

Preferred qualifications:

Experience building large-scale data infrastructure, frameworks or libraries.
Understanding of ML concepts, including model architecture and training.
Ability to collaborate effectively across teams and functions.
Solid communication (broadly and deeply) skills about recommendation technology, system design and implementation.

Staff Software Engineer, ML Data Infrastructure

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Responsibilities

Qualifications

Minimum qualifications:

Preferred qualifications:

Responsibilities

Qualifications

Minimum qualifications:

Preferred qualifications: