Currently tracking 489 active AI roles, up 170% versus the prior 4 weeks. Primary focus: Agent · Engineering. Salary range $98k–$505k (avg $233k).
| Title | Stage | AI score |
|---|---|---|
| Research Scientist, Evaluations, Security and Privacy, DeepMind Research Scientist focused on security and privacy for AI models and agentic products, specifically Gemini. The role involves designing and evaluating novel defense mechanisms against adversarial attacks and prompt injections, translating research into practical solutions for training and inference pipelines, and collaborating with core modeling and engineering teams. The position requires a PhD and experience in ML research, benchmarking, and security, with a focus on next-generation security techniques for autonomous AI systems. | Eval GateAgent | 9 |
| Senior Staff Research Engineer, DeepMind Senior Staff Research Engineer at Google DeepMind focused on Agent Evals and Quality for GenAI model improvement and product development. The role involves developing, evaluating, and optimizing LLM-based agents for complex, multi-step tasks. Responsibilities include constructing quantitative benchmarks and automated evaluation frameworks (e.g., LLM-as-a-judge) to measure agent capabilities in reasoning, planning, and tool use, as well as creating and optimizing data mixes from user feedback for training and fine-tuning agents. The role also requires analyzing agent behavior to identify failure modes and performance bottlenecks. |
| Eval GateAgent |
| 9 |
| Senior Quality Engineer, Gemini Enterprise Quality Senior Quality Engineer for Gemini Enterprise Quality at Google Cloud AI Research. This role involves designing and implementing ML solutions, leveraging ML infrastructure, and focusing on quality assurance for AI products, particularly in specialized ML areas like speech/audio or reinforcement learning. The role requires experience in ML infrastructure, including model deployment and evaluation, and contributes to bringing AI innovations to real-world impact. | Eval GateServe | 7 |
| Senior Staff Uber Technical Lead, Observability Intelligence Senior Staff Uber Technical Lead for Observability Intelligence, driving the strategic shift of SRE incident response to an AI-driven paradigm within Google Cloud's monitoring systems. This role involves leading large-scale ML infrastructure optimization, defining the Observability Intelligence strategy, representing the organization in technical reviews, and partnering with Product Management to translate product needs into scalable architectural solutions. The focus is on building a cohesive, AI-powered observability ecosystem. | Eval GateServe | 7 |
| Senior Clinical Specialist, AI Evaluations This role focuses on evaluating AI model performance for health applications, leveraging clinical expertise to guide product development and ensure safety, quality, and efficacy. It involves applying evidence-based practices and contributing to the real-world implementation of AI health products. | Eval GateAgent | 7 |
| Software Engineer III, Skills Evaluation, Chrome Software Engineer III role focused on building and maintaining evaluation pipelines, safety classifiers, and automated testing systems for AI skills within the Chrome product. This involves designing and implementing metrics, visualization tools, and auto-raters to ensure the quality, safety, and performance of AI workflows, with a focus on integrating with various AI models and browser surfaces. | Eval GatePost-train | 7 |
| Principal Analyst, Trust and Safety Trusted Experiences, GenAI This role focuses on ensuring the safe launch of Generative AI models, acting as a key advisor and strategist for cross-functional teams. It involves anticipating risks, designing testing strategies, analyzing results, and driving mitigation and post-launch monitoring, with a specific emphasis on Text Models, Model Personalization, Model Governance, and Health/Mental Health. | Eval Gate | 7 |
| Staff Software Engineer, Agentic Data and Evals Staff Software Engineer focused on building and launching tools and solutions for GenAI data generation and evaluations. The role involves developing a self-service data generation platform, performing LLM/GenAI model evaluations, and fine-tuning models using techniques like RLHF. The engineer will work cross-functionally to deliver high-quality data sets and evaluation infrastructure for various GenAI use cases. | Eval GatePost-train | 7 |
| Senior Data Scientist, Core Ranking and AI Context Senior Data Scientist role focused on Core Ranking and AI Context Engineering (CRAFT) for Google Search, AI Overview, and AI Mode products. The role involves identifying quality and metric headroom, conducting analyses, applying statistical/AI methods, developing and automating evals and measurements for iterative improvements, and partnering with engineering and product teams to drive system changes and launches. The position requires a Master's degree in a quantitative field and 5 years of experience in analytics and coding, with preferred experience in consumer-facing products and evaluation methodologies. | Eval GateShip | 7 |
| Senior Strategist, Kids and Learning Trust and Safety This role focuses on ensuring the safety and trustworthiness of Generative AI experiences for young users, specifically in educational contexts. The Senior Strategist will develop and implement product safety strategies, analyze risks, and work with engineering and product teams to build responsible AI capabilities, including those for image, video, and agentic AI. Key responsibilities include analyzing data to identify and combat abuse, enhancing operational workflows, improving model safety, debugging escalations, and managing technical projects. | Eval GateAgent | 7 |
| Staff Data Scientist, Research, Search Health Research Data Scientist focused on evaluation and metrics for AI answers in Search Health, developing advanced ML/LLM methodologies to identify product opportunities and influence product/engineering directions. | Eval Gate | 7 |