Staff Software Engineer, ML Data Infrastructure

Google Google · Big Tech · San Bruno, CA +1

Google's YouTube Discovery Data team is seeking a Staff Software Engineer to build and maintain large-scale data processing pipelines that power personalized discovery and ML models at YouTube. The role involves enabling next-generation model architectures and training procedures, reducing complexity in ML training infrastructure, and collaborating with other infrastructure teams. The ideal candidate will have extensive experience in C++ programming, large-scale infrastructure development, and a solid understanding of ML concepts.

What you'd actually do

  1. Enable next-generation model architectures and training procedures.
  2. Write and maintain large-scale data processing pipelines in C++.
  3. Propose and secure buy-in from our clients to build new infrastructure for the evolving training data use-cases.
  4. Reduce complexity and fragmentation in the ML training infrastructure by providing standardized, composable, and self-service infrastructure solutions.
  5. Collaborate closely with other infrastructure teams working on recommendations quality, storage, logging and privacy. Debug data quality and infrastructure issues across the stack.

Skills

Required

  • Bachelor's degree or equivalent practical experience
  • 8 years of experience programming in C++
  • 5 years of experience testing, and launching software products
  • 5 years of experience building and developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage, or hardware architecture
  • 3 years of experience with software design and architecture

Nice to have

  • Experience building large-scale data infrastructure, frameworks or libraries
  • Understanding of ML concepts, including model architecture and training
  • Ability to collaborate effectively across teams and functions
  • Solid communication (broadly and deeply) skills about recommendation technology, system design and implementation

What the JD emphasized

  • 8 years of experience programming in C++
  • 5 years of experience testing, and launching software products
  • 5 years of experience building and developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage, or hardware architecture
  • 3 years of experience with software design and architecture

Other signals

  • powers personalized discovery at YouTube
  • train and serve more than a thousand ML models
  • use of LLMs for personalized discovery at YouTube scale
  • large-scale data processing pipelines
  • ML training infrastructure