Currently tracking 489 active AI roles, up 170% versus the prior 4 weeks. Primary focus: Agent · Engineering. Salary range $98k–$505k (avg $233k).
| Title | Stage | AI score |
|---|---|---|
| Research Scientist, Evaluations, Security and Privacy, DeepMind Research Scientist focused on security and privacy for AI models and agentic products, specifically Gemini. The role involves designing and evaluating novel defense mechanisms against adversarial attacks and prompt injections, translating research into practical solutions for training and inference pipelines, and collaborating with core modeling and engineering teams. The position requires a PhD and experience in ML research, benchmarking, and security, with a focus on next-generation security techniques for autonomous AI systems. | Eval GateAgent | 9 |
| Senior Staff Research Engineer, DeepMind Senior Staff Research Engineer at Google DeepMind focused on Agent Evals and Quality for GenAI model improvement and product development. The role involves developing, evaluating, and optimizing LLM-based agents for complex, multi-step tasks. Responsibilities include constructing quantitative benchmarks and automated evaluation frameworks (e.g., LLM-as-a-judge) to measure agent capabilities in reasoning, planning, and tool use, as well as creating and optimizing data mixes from user feedback for training and fine-tuning agents. The role also requires analyzing agent behavior to identify failure modes and performance bottlenecks. |
| Eval GateAgent |
| 9 |
| Software Engineer, AI i18n and Evaluations Software Engineer focused on AI internationalization and evaluations for Pixel and Android. Responsibilities include leading R&D for AI feature expansion, quality evaluations, and rater quality using on-device and server-based models. Tasks involve creating auto-raters, ensuring metric consistency, establishing benchmarks, and collaborating with AI feature teams. The role also involves identifying opportunities and leading roadmaps to scale language capabilities and improve model evaluation processes. | Eval GatePost-train | 8 |
| Senior Quality Engineer, Gemini Enterprise Quality Senior Quality Engineer for Gemini Enterprise Quality at Google Cloud AI Research. This role involves designing and implementing ML solutions, leveraging ML infrastructure, and focusing on quality assurance for AI products, particularly in specialized ML areas like speech/audio or reinforcement learning. The role requires experience in ML infrastructure, including model deployment and evaluation, and contributes to bringing AI innovations to real-world impact. | Eval GateServe | 7 |
| Senior Staff Uber Technical Lead, Observability Intelligence Senior Staff Uber Technical Lead for Observability Intelligence, driving the strategic shift of SRE incident response to an AI-driven paradigm within Google Cloud's monitoring systems. This role involves leading large-scale ML infrastructure optimization, defining the Observability Intelligence strategy, representing the organization in technical reviews, and partnering with Product Management to translate product needs into scalable architectural solutions. The focus is on building a cohesive, AI-powered observability ecosystem. | Eval GateServe | 7 |
| Senior Clinical Specialist, AI Evaluations This role focuses on evaluating AI model performance for health applications, leveraging clinical expertise to guide product development and ensure safety, quality, and efficacy. It involves applying evidence-based practices and contributing to the real-world implementation of AI health products. | Eval GateAgent | 7 |
| Engineering Analyst II, Gemini and Labs This role focuses on defining and implementing safety strategies for generative AI systems, including developing evaluation paradigms, guiding engineering and research teams on safety mitigations like fine-tuning and guardrails, and analyzing the AI threat landscape to create a proactive mitigation agenda. The role is critical for ensuring AI safety is a foundational component of Google's AI systems. | Eval GatePost-train | 7 |
| Software Engineer III, Skills Evaluation, Chrome Software Engineer III role focused on building and maintaining evaluation pipelines, safety classifiers, and automated testing systems for AI skills within the Chrome product. This involves designing and implementing metrics, visualization tools, and auto-raters to ensure the quality, safety, and performance of AI workflows, with a focus on integrating with various AI models and browser surfaces. | Eval GatePost-train | 7 |
| Principal Analyst, Trust and Safety Trusted Experiences, GenAI This role focuses on ensuring the safe launch of Generative AI models, acting as a key advisor and strategist for cross-functional teams. It involves anticipating risks, designing testing strategies, analyzing results, and driving mitigation and post-launch monitoring, with a specific emphasis on Text Models, Model Personalization, Model Governance, and Health/Mental Health. | Eval Gate | 7 |
| Staff Software Engineer, Agentic Data and Evals Staff Software Engineer focused on building and launching tools and solutions for GenAI data generation and evaluations. The role involves developing a self-service data generation platform, performing LLM/GenAI model evaluations, and fine-tuning models using techniques like RLHF. The engineer will work cross-functionally to deliver high-quality data sets and evaluation infrastructure for various GenAI use cases. | Eval GatePost-train | 7 |
| Senior Data Scientist, Core Ranking and AI Context Senior Data Scientist role focused on Core Ranking and AI Context Engineering (CRAFT) for Google Search, AI Overview, and AI Mode products. The role involves identifying quality and metric headroom, conducting analyses, applying statistical/AI methods, developing and automating evals and measurements for iterative improvements, and partnering with engineering and product teams to drive system changes and launches. The position requires a Master's degree in a quantitative field and 5 years of experience in analytics and coding, with preferred experience in consumer-facing products and evaluation methodologies. | Eval GateShip | 7 |
| Senior Strategist, Kids and Learning Trust and Safety This role focuses on ensuring the safety and trustworthiness of Generative AI experiences for young users, specifically in educational contexts. The Senior Strategist will develop and implement product safety strategies, analyze risks, and work with engineering and product teams to build responsible AI capabilities, including those for image, video, and agentic AI. Key responsibilities include analyzing data to identify and combat abuse, enhancing operational workflows, improving model safety, debugging escalations, and managing technical projects. | Eval GateAgent | 7 |
| Staff Data Scientist, Research, Search Health Research Data Scientist focused on evaluation and metrics for AI answers in Search Health, developing advanced ML/LLM methodologies to identify product opportunities and influence product/engineering directions. | Eval Gate | 7 |
| Senior Engineering Analyst, Photos Responsible AI This role focuses on ensuring the safety and trustworthiness of AI features within Google Photos, specifically generative AI. The Senior Engineering Analyst will work with various teams to develop and execute comprehensive evaluations, identify emerging risks and abuse vectors, and build resilience against malicious inputs. The role involves defining testing approaches, tools, and solutions, establishing testing to discover risks, and defining program metrics and feedback loops. | Eval GatePost-train | 7 |
| Technical Program Manager, Generative AI Safety Technical Program Manager for Generative AI Safety at Google, focusing on leading initiatives to expand content safety infrastructure, integrate safety classifiers, and build rapid response capabilities for AI abuse. The role involves partnering with cross-functional leaders to convert threat intelligence into scalable models and technical protections within the serving stack, orchestrating safety engineering teams, and managing global workflows for timely integration and evaluation of safety models for Gemini releases. This role also coordinates with infrastructure teams, generative AI product groups, and foundational model researchers to integrate safety signals into primary models. | Eval GatePost-train | 7 |
| Engineering Analyst, Kids and Learning Trust and Safety This role supports the launch of Generative AI search experiences and education efforts, focusing on responsible AI capabilities. The analyst will perform data analysis to identify and combat abuse, develop datasets and run evaluations for Gen AI products, establish metrics for AI issues, and improve model safety through data analysis. The role requires experience in data analysis, project management, and familiarity with ML model performance or LLMs. | Eval Gate | 5 |
| Learning Impact Specialist, LearnX This role focuses on developing and implementing evaluation frameworks to assess the quality of Generative AI tools within an educational context. The specialist will leverage learning science principles to consult with product development teams and lead discussions on how GenAI can shape better learning outcomes. While not directly building AI models, the role is critical in evaluating their impact and quality in educational products. | Eval Gate | 5 |