Software Engineer Iii, Skills Evaluation, Chrome

Google Google · Big Tech · Kirkland, WA +1

Software Engineer III role focused on building and maintaining evaluation pipelines, safety classifiers, and automated testing systems for AI skills within the Chrome product. This involves designing and implementing metrics, visualization tools, and auto-raters to ensure the quality, safety, and performance of AI workflows, with a focus on integrating with various AI models and browser surfaces.

What you'd actually do

  1. Design and implement evaluation pipelines, metrics, and visualization tools, while building automated learning-based testing systems and calibrated auto-raters to improve the efficiency and reliability of evaluation processes.
  2. Implement safety classifiers, conduct adversarial testing, and validate skill performance across various model architectures and browser surfaces, such as AI mode and Gemini in Chrome.
  3. Analyze evaluation results to identify trends and propose data-driven solutions, while collaborating with other AI teams to integrate new use cases and requirements into the framework.
  4. Write high-quality, maintainable code according to Google's best practices, and contribute through active participation in code reviews and design discussions.
  5. Prioritize and manage tasks to ensure the timely delivery of solutions, while contributing to technical documentation and playbooks for the team.

Skills

Required

  • Bachelor’s degree or equivalent practical experience
  • 2 years of experience programming in C++, Python, or Kotlin
  • 1 year of experience with ML infrastructure (e.g., model deployment, model evaluation, optimization, data processing, debugging)

Nice to have

  • Master's degree or PhD in Computer Science or related technical fields
  • Experience designing, building, and maintaining scalable software platforms, tools, or infrastructure
  • Experience with AI model evaluation methodologies, metrics, and pipeline development
  • Experience with ML, performance analysis and optimization
  • Familiarity with Chrome ecosystem
  • Strong analytical, problem-solving, and debugging skills

What the JD emphasized

  • ML infrastructure (e.g., model deployment, model evaluation, optimization, data processing, debugging)
  • AI model evaluation methodologies, metrics, and pipeline development

Other signals

  • building a premier user experience for discovering, creating, and executing Skills that are safe, high-quality, and performant
  • design and implement evaluation pipelines, metrics, and visualization tools
  • building automated learning-based testing systems and calibrated auto-raters
  • implement safety classifiers, conduct adversarial testing, and validate skill performance
  • analyze evaluation results to identify trends and propose data-driven solutions