Staff Software Engineering, Youtube ML Efficiency

Google Google · Big Tech · San Bruno, CA +1

Staff Software Engineer focused on ML efficiency for YouTube's recommendation systems, working on model architectures, training procedures, and scaling experimentation on TPUs. The role involves optimizing ML infrastructure, reducing complexity in the ML ecosystem, and automating training, evaluation, and serving.

What you'd actually do

  1. Monitor the evolving landscape of recommendation systems, actively prototyping and benchmarking emerging modeling techniques to keep our infrastructure cutting-edge and efficient.
  2. Enable next-generation model architectures and training procedures.
  3. Scale experimentation capacity under our resource envelope.
  4. Reduce complexity and fragmentation in the ML training and serving ecosystem by providing standardized, composable, and reusable solutions.
  5. Reduce experimenter toil through introduction of automation frameworks for training, evaluation, and model serving.

Skills

Required

  • software development
  • ML design
  • ML infrastructure optimization
  • model deployment
  • model evaluation
  • data processing
  • debugging
  • fine tuning
  • large-scale recommendation systems
  • Machine Learning (ML)
  • ranking
  • personalization

Nice to have

  • ML models/algorithm design and implementation
  • collaboration
  • problem solving
  • quantitative reasoning
  • communication

What the JD emphasized

  • 8 years of experience in software development
  • 5 years of experience leading ML design and optimizing ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine tuning)
  • 3 years of building large-scale recommendation systems, Machine Learning (ML), ranking, or personalization

Other signals

  • improving performance and extracting maximum efficiency for machine learning and AI workloads
  • evolving YouTube's models for next TPU generations
  • reducing complexity and fragmentation in the ML training and serving ecosystem