Software Engineer, AI I18n and Evaluations

Google Google · Big Tech · Singapore

Software Engineer focused on AI internationalization and evaluations for Pixel and Android. Responsibilities include leading R&D for AI feature expansion, quality evaluations, and rater quality using on-device and server-based models. Tasks involve creating auto-raters, ensuring metric consistency, establishing benchmarks, and collaborating with AI feature teams. The role also involves identifying opportunities and leading roadmaps to scale language capabilities and improve model evaluation processes.

What you'd actually do

  1. Ensure a safe and high quality experience for our AI products by crafting evaluation datasets, metrics, and pipelines to understand, evaluate and optimize the behavior of our models, platform and algorithms across languages, locales, and different hardware
  2. Identify opportunities, develop strategies, and lead roadmaps to scale country and language capabilities, improve the reliability, scalability, and efficiency of model evaluation processes
  3. Collaborate closely with AI feature teams, model developers and researchers to understand evaluation requirements, provide support, and integrate new models and use cases into the evaluation and benchmarks.
  4. Growing the local team expertise and projects, mentoring team and removing bottlenecks. Improve team's design, code, and engineering practices
  5. Interface with cross-functional and remote teams, engineering managers, and stakeholders to to influence cross-team priority and roadmap

Skills

Required

  • software development
  • testing and launching software products
  • software design and architecture
  • Speech/audio
  • reinforcement learning
  • ML infrastructure
  • ML design
  • ML infrastructure
  • model deployment
  • model evaluation
  • data processing
  • debugging
  • fine tuning

Nice to have

  • on-device machine learning
  • mobile algorithms
  • large Language Model/GenAI evaluations
  • data collection for ML
  • Android development
  • AI toolchain
  • evaluation metrics design
  • data management techniques
  • launching one or multiple AI/ML-powered user-facing products across countries
  • Android or Pixel development ecosystem

What the JD emphasized

  • quality evaluations
  • rater quality
  • evaluation datasets
  • evaluate and optimize
  • model evaluation processes
  • large Language Model/GenAI evaluations

Other signals

  • evaluating AI features
  • quality evaluations
  • rater quality
  • intelligent applications
  • creating auto-raters
  • ensuring metrics consistency
  • establishing benchmarks
  • quality and performance bar for AI launches
  • evaluation datasets, metrics, and pipelines
  • evaluate and optimize the behavior of our models
  • integrate new models and use cases into the evaluation and benchmarks
  • scale country and language capabilities
  • improve the reliability, scalability, and efficiency of model evaluation processes
  • large Language Model/GenAI evaluations