Principal Engineering Analyst, Rai Testing

Google Google · Big Tech · Washington, DC +1

This role focuses on building and operationalizing scalable, automated AI testing frameworks and evaluation systems within Google's Trust & Safety Responsible AI Testing team. The goal is to empower product teams with self-service infrastructure for standard safety evaluations, while the role itself handles high-risk, bespoke evaluations for novel AI paradigms. It involves leveraging SQL and Python to embed these systems into developer pipelines and partnering with cross-functional stakeholders.

What you'd actually do

  1. Leverage SQL and Python to embed self-service frameworks and automated evaluation systems into developer pipelines, enabling product teams to run standard evaluations autonomously.
  2. Act as the operational executor for complex, high-risk, and bespoke strategic evaluations, bridging the gap between defining safety quality and enforcing it.
  3. Partner with cross-functional stakeholders—including engineering teams, policy experts, and launch leadership—to develop and own intake triage and handoff governance protocols.
  4. Develop, maintain, and execute automated quality rubrics across testing services to ensure actionable results. Drive initiatives to significantly increase the use of automated evaluations and optimize operational resource allocation.
  5. Work autonomously to identify and solve problems and collaborate effectively within a team to develop comprehensive solutions. This role works with sensitive content or situations and may be exposed to graphic, controversial, and/or upsetting topics or content.

Skills

Required

  • SQL
  • Python
  • data analysis
  • project management

Nice to have

  • machine learning systems
  • quantitative discipline Master's degree
  • R
  • C++

What the JD emphasized

  • automated evaluation systems
  • automated evaluations

Other signals

  • AI testing frameworks
  • automated evaluation systems
  • self-service infrastructure
  • high-risk, bespoke evaluations
  • novel AI paradigms