Currently tracking 124 active AI roles, with 106 new openings in the last 4 weeks. Primary focus: Agent · Engineering. Salary range $46k–$850k (avg $405k).
Anthropic has 145 active AI-related job listings. The majority of these roles are focused on agents, comprising 28% of the total. Engineering is the most frequent function, with 74 listings, followed by Research with 51. The company is primarily hiring in the United States, with 118 positions, and the United Kingdom, with 22. Frequent tech tags include model_serving, evals, and agent_orchestration, suggesting a focus on deployment and evaluation of AI systems. In the last 30 days, Anthropic posted 16 new AI roles, a 47% decrease compared to the previous 30-day period.
Anthropic currently has 132 active AI-related roles in our index. The most common open titles are: Applied AI Architect, Industries (2), Regional Research Economist, Economic Research (2), Research Engineer, Machine Learning (RL Velocity) (2), Research Engineer, Production Model Post-Training (2), Staff Software Engineer, AI Reliability Engineering (2). Most positions are in Engineering and Research.
Anthropic's active AI hiring is concentrated in: agents (28%), serving infrastructure (17%), post-training (14%). These categories follow a seven-stage AI lifecycle: data, pre-training, post-training, serving infrastructure, agents, evaluation, and application.
Anthropic is hiring AI talent in: United States (106 roles), United Kingdom (20 roles), Canada (6 roles), Ireland (5 roles).
Job postings at Anthropic most frequently reference: model serving, evals, llm observability, agent orchestration, inference infra.
In the past 30 days, Anthropic has posted 29 new AI-related roles. That is a +61% change versus the prior 30 days (18 → 29).
| Title | Stage | AI score |
|---|---|---|
| Research Engineer, Model Evaluations Research Engineer focused on building and operating the evaluation infrastructure for large language models, ensuring their capabilities, knowledge, and safety properties are rigorously measured and validated at scale. This role involves designing evaluations, building distributed systems for running them, monitoring model health during training, and partnering with researchers to interpret results. | Eval GatePost-train | 9 |
| Research Engineer, RL Infrastructure (Knowledge Work) Research Engineer focused on the reliability, observability, and infrastructure of training environments and evaluation systems for AI models, ensuring stability and quality as they scale. The role involves proactive hardening, building tooling for early problem detection, and serving as a dedicated owner for environment health and evaluation integrity. |
| Eval GateData |
| 9 |
| Research Engineer, Safeguards Labs Research Engineer focused on AI safety, investigating novel methods for detecting misuse, strengthening model safeguards, and building evaluation methodologies for AI systems, particularly in agentic workflows. The role involves leading research projects, designing offline analyses, developing prototypes, and collaborating with production teams. | Eval GatePost-train | 9 |
| Anthropic STEM Fellow This role is for a STEM Fellow to work alongside Anthropic's research teams for a few months. Fellows will use their domain expertise to evaluate, improve, and apply Claude's capabilities in their field. This involves designing evaluations, identifying data/techniques for capability gaps, and applying Claude to open problems using various strategies and tools. Projects are scoped to ship within the fellowship period. | Eval GateAgent | 9 |
| Research Lead, Training Insights Research Lead focused on developing and executing strategies for measuring and characterizing model capabilities across training and deployment. This role involves driving original research into new evaluation methodologies, leading a team, and spanning the full lifecycle of model development, from pretraining to deployment. The work includes creating long-horizon evaluations, measuring emerging capabilities, and understanding their development during RL training and post-training. The role also involves cross-organizational collaboration to map evaluation landscapes and identify gaps, shaping the evaluation narrative for model releases, and contributing to the broader research community. | Eval GatePost-train | 9 |
| Research Engineer, AI Observability Research Engineer focused on designing and building AI-based monitoring systems to analyze large unstructured datasets, produce structured insights, and develop agentic integrations for investigation and action. The role involves working across the full stack, from core analysis frameworks to user-facing applications, with a direct impact on measuring and mitigating AI misuse and misalignment. This role is critical for scaling human oversight of AI systems. | Eval GateAgent | 9 |
| Research Scientist, Frontier Red Team (Emerging Risks) Research Scientist focused on understanding and defending against societal risks from advanced AI models, particularly self-improving and autonomous systems. The role involves designing research experiments, building evals, and producing artifacts to communicate model capabilities and inform product/safeguards decisions. Emphasis on emerging risks from AI integration into the economy and society. | Eval GateAgent | 9 |
| Model Quality Software Engineer, Claude Code Staff Software Engineer to set technical direction at the intersection of engineering and research on the Claude Code team. Architect systems, tooling, and evaluation infrastructure to measure, understand, and improve Claude's coding capabilities. Drive architecture, mentor engineers, and influence the direction of Claude Code. | Eval GateAgent | 9 |
| Applied Safety Research Engineer, Safeguards Research-oriented engineer to develop methods for representative, robust, and informative AI safety evaluations. This role involves designing experiments to improve model behavior evaluation, shipping these methods into pipelines that inform model training and deployment, and directly shaping how Anthropic understands and improves model safety across misuse, prompt injection, and user well-being. The role also involves building tooling for policy experts and surfacing findings to drive upstream model improvements. | Eval GatePost-train | 9 |
| Research Engineer, Model Evaluations Research Engineer focused on designing and implementing Anthropic's model evaluation platform, shaping how models are understood, measured, and improved. This role involves leading the architecture of scalable evaluation infrastructure, implementing high-throughput pipelines for production training, analyzing results to guide model development, and collaborating with research and training teams. The goal is to ensure models meet high standards for capabilities and safety before deployment, influencing training decisions and the overall model roadmap. | Eval GatePost-train | 9 |
| Research Engineer, Model Evaluations Research Engineer focused on designing and implementing Anthropic's model evaluation platform, influencing training decisions and model development. This role involves leading the architecture of scalable evaluation pipelines, analyzing results, partnering with research teams, and contributing to publications. It sits at the intersection of research and engineering, with a strong emphasis on AI safety and model capabilities. | Eval GatePost-train | 9 |
| Research Engineer, Model Performance & Quality Research Engineer focused on systematically understanding and monitoring model quality in real-time. This role involves training production models, developing monitoring systems, and creating novel evaluation methodologies, bridging research and production across the model training pipeline. | Eval GatePost-train | 9 |
| Research Engineer, Model Performance & Quality Research Engineer focused on systematically understanding and monitoring model quality in real-time. This role involves training production models, developing monitoring systems, and creating novel evaluation methodologies, bridging research and production across the model training pipeline. | Eval GatePost-train | 9 |
| ML Infrastructure Engineer, Safeguards ML Infrastructure Engineer focused on building and scaling critical infrastructure for AI safety systems, including real-time and batch classifier/safety evaluations, monitoring, and optimizing inference for safety-critical applications. | Eval GateServe | 9 |
| Research Scientist, Societal Impacts Research Scientist focused on empirical studies of AI's societal impacts, developing measurement systems and evaluation frameworks, and translating insights into product/policy recommendations. This role involves both quantitative and qualitative methods, with a focus on areas like economics, well-being, education, and alignment. | Eval GateAgent | 9 |
| Research Scientist, Frontier Red Team (CBRN, Biosecurity) Research Scientist focused on red-teaming AI models for biosecurity risks, involving fine-tuning, threat modeling, and developing novel evaluations. This role bridges AI safety research with domain expertise in biosecurity. | Eval GatePost-train | 9 |
| Research Scientist, Frontier Red Team (Autonomy) Research Scientist role focused on developing and productionizing advanced autonomy evaluations for AI Safety Level (ASL) determination of models. This involves risk and capability modeling, designing, implementing, and running large-scale experiments to evaluate autonomous capabilities and forecast future capabilities, with potential for people management. | Eval GateAgent | 9 |
| Research Engineer, Frontier Red Team (RSP Evaluations) Research Engineer focused on developing and running "gold standard" evaluations for catastrophic risks to ensure safe release of frontier AI models, aligning with the Responsible Scaling Policy (RSP). The role involves creating evaluation systems, collaborating with domain experts, building sandboxed testing environments, and informing critical deployment decisions. | Eval Gate | 9 |
| Research Engineer, Societal Impacts Research Engineer focused on building infrastructure for foundational research into AI's societal impact. This involves designing and implementing scalable systems for experiments, evaluations, and data processing, with a strong emphasis on reliability and supporting future research directions. The role requires close collaboration with researchers and policy experts to generate insights and inform strategy. | Eval Gate | 9 |
| Product Manager, Safeguards Rare Harms Product Manager for Anthropic's Safeguards team, focusing on building and deploying systems to ensure AI safety and prevent misuse. This role involves ideation, design, development, and UX for safeguards, working closely with research and product teams to mitigate risks associated with frontier models across various platforms. | Eval GateAgent | 8 |
| Engineering Manager, Agent Prompts & Evals Engineering Manager to lead the Agent Prompts & Evals team, responsible for the infrastructure that enables shipping model and prompt changes with confidence. This includes eval frameworks, system prompt pipelines, and regression-detection systems. The team acts as a platform for model behavior, sitting between product engineering and research, and partners with other evals groups and product teams. The role requires leading and growing a team, owning the product-side eval platform and system prompt infrastructure, managing model launches, fostering collaboration, recruiting engineers, and shaping team investment in areas like frontier eval development and launch automation. | Eval GateAgent | 8 |
| Biological Safety Research Scientist Research Scientist focused on biological safety for AI systems, applying technical skills to design and develop safety systems that detect harmful behaviors and prevent misuse. This role involves designing and executing capability evaluations, collaborating on training data and safety system training, analyzing performance, and stress-testing safeguards. The goal is to ensure biological safety is embedded throughout the model development lifecycle, balancing AI's potential in life sciences with preventing misuse. | Eval GatePost-train | 8 |
| Machine Learning Systems Engineer, Model Evaluations This role focuses on building and maintaining the infrastructure for Model Evaluations and Research Inference, enabling researchers to systematically test and assess model capabilities. It involves designing scalable systems, optimizing APIs, creating data pipelines, and implementing monitoring for research-focused inference systems. The goal is to accelerate the model development lifecycle and support Anthropic's mission of creating safe and beneficial AI. | Eval GateServe | 8 |
| Machine Learning Systems Engineer, Model APIs Machine Learning Systems Engineer focused on building and maintaining Model Evaluations infrastructure and Research Inference APIs/infrastructure to enable researchers to effectively evaluate models and conduct inference tasks, directly impacting AI advancement. | Eval GateServe | 8 |
| Data Scientist, Safeguards This role focuses on building and scaling a data-driven culture within an AI company, specifically for safeguards. The Data Scientist will analyze user behavior, define key metrics, identify opportunities for product improvement, design and analyze experiments, and establish data best practices to inform product and commercial strategy for safe, frontier AI deployment. | Eval Gate | 7 |
| Member of Staff, AI & Rule of Law This role focuses on researching the impact of AI on democratic institutions and the rule of law, developing frameworks for AI safety evaluations with a legal lens, analyzing institutional vulnerabilities, and exploring novel legal issues and applications of AI in governance. The role requires deep expertise in both AI and law/government/policy, and involves using AI systems like Claude extensively. | Eval Gate | 7 |
| Software Engineer, Safeguards Infrastructure Software Engineer focused on building foundational systems for AI safety, including infrastructure for data management, metric and evaluation systems, and tooling for human and agentic review. The role involves ensuring the day-to-day running of Safeguards systems and building robust, reliable multi-layered defenses for real-time improvement of safety mechanisms at scale. | Eval GateAgent | 7 |
| Safeguards Analyst, Human Exploitation & Abuse This role focuses on building and operating enforcement systems to detect and mitigate the misuse of AI products for human exploitation and abuse. It involves tuning classifiers, curating evaluation datasets, conducting investigations using data analysis tools, and collaborating with product and engineering teams to develop detection signals and mitigations. The role also involves external partnerships and staying ahead of evolving misuse tactics. | Eval GateData | 5 |
| Safeguards Enforcement Analyst, Safety Evaluations This role focuses on evaluating AI models against safety and policy standards, running and monitoring evaluations, driving mitigations, and coordinating the creation of new evaluation frameworks. It involves cross-functional collaboration with policy experts and engineering teams to ensure model behavior meets required standards and to build scalable processes for evaluation. | Eval Gate | 5 |
| Technical Program Manager, Safeguards (Infrastructure & Evals) Technical Program Manager for Safeguards Infrastructure and Evals at Anthropic. This role focuses on owning the operational health, reliability, and forward momentum of AI safety infrastructure, including classifiers, detection pipelines, evaluation platforms, and monitoring systems. Responsibilities include driving incident response, post-mortem execution, establishing and maintaining SLOs with partner teams, maintaining runbook quality, managing platform migrations, and coordinating improvements to the evals platform. Requires technical depth in production ML systems and strong program management skills in operational and infrastructure-heavy environments. | Eval GateServe | 5 |
| Technical Policy Manager, Cyber Harms This role leads a team focused on preventing AI misuse in the cyber domain by applying cybersecurity expertise to design and evaluate safety systems. It involves creating cyber threat models, developing usage policies, collaborating with ML engineers on safety system training, and analyzing performance. The goal is to ensure AI models handle dual-use cybersecurity knowledge responsibly, balancing potential benefits with preventing misuse. | Eval GatePost-train | 5 |