What you'd actually do

Execute quality iterations on reference agents identifying potential model or agent framework improvements.

Source or facilitate creation of datasets for evaluation and training to improve model performance for Cloud customers.

Design new agentic systems and context engineering algorithms, utilizing existing models and identifying the need for new training/evaluation objectives where necessary.

Define and implement metrics that correspond to business problems.

Prototype and iterate on the solution working closely with customers, product management and business development.

Skills

Required

Python
Large Language Models (LLMs)
building agents

Nice to have

tokens
context
Retrieval-Augmented Generation (RAG)
function calling
data
basic statistical analysis concepts
general data science principles
core ML concepts
training algorithms
best practices for evaluation
running quality interactions
understanding business goals
defining technical metrics
implementing evaluation frameworks
designing experiments
analyzing results
performing RCAs
formulating hypotheses
conducting ablation or live experiments

What the JD emphasized

8 years of experience working with Large Language Models (LLMs) and building agents

running quality interactions, involving understanding business goals, defining technical metrics, implementing evaluation frameworks, designing experiments, analyzing results, performing RCAs, formulating hypotheses, and conducting ablation or live experiments

Google Cloud's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google Cloud's needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. You will anticipate our customer needs and be empowered to act like an owner, take action and innovate. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

In this role, you will be responsible for quality of next-generation AI agents and driving improvements through continuous quality iterations, improving existing and delivering new capabilities to Cloud customers. You will deliver robust, production-ready agentic systems by leading the engineering of the ML quality feedback loop.

The Google Cloud AI Research team addresses AI challenges motivated by Google Cloud’s mission of bringing AI to tech, healthcare, finance, retail and many other industries. We work on a range of unique problems focused on research topics that maximize scientific and real-world impact, aiming to push the state-of-the-art in AI and share findings with the broader research community. We also collaborate with product teams to bring innovations to real-world impact that benefits our customers.

Responsibilities

Execute quality iterations on reference agents identifying potential model or agent framework improvements.
Source or facilitate creation of datasets for evaluation and training to improve model performance for Cloud customers.
Design new agentic systems and context engineering algorithms, utilizing existing models and identifying the need for new training/evaluation objectives where necessary.
Define and implement metrics that correspond to business problems.
Prototype and iterate on the solution working closely with customers, product management and business development.

Qualifications

Minimum qualifications:

Bachelor's degree or equivalent practical experience
8 years of experience working with Large Language Models (LLMs) and building agents.

Preferred qualifications:

Solid understanding of essential concepts such as tokens, context, Retrieval-Augmented Generation (RAG), and function calling.
Strong understanding of data and familiarity with basic statistical analysis concepts (e.g., variance, p-value, bias) and general data science principles.
Solid understanding of core ML concepts, training algorithms, and best practices for evaluation.
Familiarity with running quality interactions, involving understanding business goals, defining technical metrics, implementing evaluation frameworks, designing experiments, analyzing results, performing RCAs, formulating hypotheses, and conducting ablation or live experiments.
Proficiency in Python.

Responsibilities

Execute quality iterations on reference agents identifying potential model or agent framework improvements.
Source or facilitate creation of datasets for evaluation and training to improve model performance for Cloud customers.
Design new agentic systems and context engineering algorithms, utilizing existing models and identifying the need for new training/evaluation objectives where necessary.
Define and implement metrics that correspond to business problems.
Prototype and iterate on the solution working closely with customers, product management and business development.

Qualifications

Minimum qualifications:

Bachelor's degree or equivalent practical experience
8 years of experience working with Large Language Models (LLMs) and building agents.

Preferred qualifications:

Solid understanding of essential concepts such as tokens, context, Retrieval-Augmented Generation (RAG), and function calling.
Strong understanding of data and familiarity with basic statistical analysis concepts (e.g., variance, p-value, bias) and general data science principles.
Solid understanding of core ML concepts, training algorithms, and best practices for evaluation.
Familiarity with running quality interactions, involving understanding business goals, defining technical metrics, implementing evaluation frameworks, designing experiments, analyzing results, performing RCAs, formulating hypotheses, and conducting ablation or live experiments.
Proficiency in Python.

Staff Software Engineer, Aai Research

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Responsibilities

Qualifications

Minimum qualifications:

Preferred qualifications:

Responsibilities

Qualifications

Minimum qualifications:

Preferred qualifications: