Senior Staff Software Engineer, Search Evaluation Infrastructure Development

Google Google · Big Tech · New York, NY +2

Senior Staff Software Engineer focused on developing evaluation infrastructure for generative AI features in Google Search. The role involves pioneering Evaluation-Driven Development (EDD), establishing evaluation rubrics, cultivating scalable automated rating infrastructure, and delivering continuous tracking systems to improve development speed and confidence. Key responsibilities include designing and enhancing large-scale evaluation software, providing technical leadership, managing project priorities, and ensuring the reliability, capacity, and effectiveness of automated rating systems.

What you'd actually do

  1. Design, develop, test, deploy, maintain, and enhance large-scale evaluation software and automated rating infrastructure.
  2. Provide technical leadership on high-impact projects, managing project priorities, deadlines, and deliverables.
  3. Drive technical project strategy, lead large-scale machine learning (ML) infrastructure optimization, and oversee the design and implementation of solutions across multiple specialized ML areas.
  4. Manage automated rating system reliability, capacity, and effectiveness to reduce quality iteration cycles from hours to minutes.
  5. Facilitate alignment across teams, collaborating with platform and intelligence engineering counterparts to represent evaluation infrastructure in technical forums.

Skills

Required

  • software development
  • technical project strategy
  • ML design
  • ML infrastructure
  • model deployment
  • model evaluation
  • data processing
  • debugging
  • fine tuning
  • machine learning evaluation frameworks
  • automated rating for large-scale infrastructure

Nice to have

  • technical leadership
  • Large Language Models (LLMs)
  • agentic architectures
  • generative AI evaluation methodologies
  • infrastructure projects end to end
  • AI tools and development assistants

What the JD emphasized

  • Evaluation-Driven Development (EDD)
  • objective, metrics-based
  • scalable automated rating infrastructure
  • continuous tracking systems
  • reduce quality iteration cycles from hours to minutes
  • large-scale evaluation software
  • automated rating infrastructure
  • ML infrastructure optimization
  • automated rating system reliability, capacity, and effectiveness
  • evaluation infrastructure
  • machine learning evaluation frameworks
  • automated rating for large-scale infrastructure
  • generative AI evaluation methodologies

Other signals

  • Evaluation-Driven Development (EDD)
  • objective, metrics-based quality assessments
  • scalable automated rating infrastructure
  • continuous tracking systems
  • reduce quality iteration cycles from hours to minutes