Research Engineer, Frontier Safety Mitigations, Deepmind

Google · Big Tech · London, United Kingdom

Research Engineer focused on frontier AI safety mitigations, defending against misuse domains like CBRNE and Harmful Manipulation. Responsibilities include building evaluations, red-teaming, deploying in-model and out-of-model mitigations, and monitoring risks for frontier models, particularly agentic AI systems. The role involves developing classifiers, monitoring systems, and advancing research in automated red-teaming and adversarial robustness.

What you'd actually do

Build advanced classifiers and data pipelines to detect misuse, owning the end-to-end process from automated evaluation to rapid model iteration.
Build cross-context monitoring systems to detect coordinated harms, developing novel signal aggregation methods across disparate user sessions to identify large-scale attack vectors.
Implement data-driven, semi-automated account-level response systems to detect, track, and apply strikes against persistent malicious actors using rich signals from production traffic.
Evaluate and secure agentic AI systems by developing threat models, creating testing environments, and deploying robust mitigations against frontier-level agentic hacking and long-horizon attacks.
Be able to advance research in automated red-teaming and adversarial robustness, leveraging multi-turn/agentic attacks to systematically test for and uncover misuse vulnerabilities.

Skills

Required

Bachelor’s degree or equivalent practical experience.
5 years of experience with software development in one or more programming languages.
3 years of experience testing, maintaining, or launching software products, and 1 year of experience with software design and architecture.

Nice to have

PhD in Computer Science, Machine Learning, or equivalent practical experience, or publications at venues (e.g., NeurIPS, ICLR, ICML, or EMNLP).
Experience with cybersecurity detection and response, building classifiers and anomaly detection systems at scale, taking safety defenses or mitigations from research concepts to scalable production systems.
Experience in adversarial machine learning, automated red-teaming, or model interpretability and probes.
Experience collaborating on or leading applied ML projects, including LLM training, inference, and fine-tuning.
Experience using AI coding agents with strong architectural judgment and with TPUs and JAX.
Knowledge of AI control, chain-of-thought monitoring, monitorability, and related frontier safety research.

What the JD emphasized

defending against misuse domains
critical part of the overall strategy for building safe AI
build safety mitigations for frontier models
building defenses against risks
automated evaluation
rapid model iteration
novel signal aggregation methods
large-scale attack vectors
data-driven, semi-automated account-level response systems
persistent malicious actors
rich signals from production traffic
Evaluate and secure agentic AI systems
threat models
testing environments
robust mitigations
frontier-level agentic hacking
long-horizon attacks
automated red-teaming
adversarial robustness
multi-turn/agentic attacks
misuse vulnerabilities
cybersecurity detection and response
anomaly detection systems at scale
scalable production systems
adversarial machine learning
automated red-teaming
model interpretability and probes
LLM training, inference, and fine-tuning
AI coding agents
TPUs and JAX
AI control
chain-of-thought monitoring
monitorability
frontier safety research

Other signals

defending against misuse domains
build evaluations
red-teaming
deploy mitigations
monitor emerging risks
build safety mitigations for frontier models
build advanced classifiers and data pipelines to detect misuse
build cross-context monitoring systems to detect coordinated harms
Implement data-driven, semi-automated account-level response systems
Evaluate and secure agentic AI systems
developing threat models
creating testing environments
deploying robust mitigations against frontier-level agentic hacking and long-horizon attacks
advance research in automated red-teaming and adversarial robustness
leveraging multi-turn/agentic attacks to systematically test for and uncover misuse vulnerabilities
Experience with cybersecurity detection and response
building classifiers and anomaly detection systems at scale
taking safety defenses or mitigations from research concepts to scalable production systems
Experience in adversarial machine learning, automated red-teaming, or model interpretability and probes
Knowledge of AI control, chain-of-thought monitoring, monitorability, and related frontier safety research

Read full job description

In this role, you will de-risk model launches by defending against misuse domains (e.g., Cybersecurity, Chemical, Biological, Radiological, Nuclear, and Conventional Explosive [CBRNE], and Harmful Manipulation). You will build evaluations, conduct red-teaming, research and deploy mitigations (both in-model and out-of-model), and monitor emerging risks to enable the beneficial use of technology.

DeepMind is a dedicated scientific community, committed to ‘solving intelligence’ and ensuring technology is used for widespread public benefit. The Frontier Safety Mitigation team operates in a collaborative environment with a culture of support, dedication, and teamwork. The team takes the possibility of dangerous model capabilities seriously as AI advances. Proactively researching and implementing defense-in-depth mitigations is a critical part of the overall strategy for building safe AI.

You will join the Frontier Safety Mitigation team within the Gemini Safety team to build safety mitigations for frontier models. You will focus on building defenses against risks, contributing to DeepMind's Frontier Safety Framework commitments.Artificial intelligence will be one of humanity’s most transformative inventions. At Google DeepMind, we are a pioneering AI lab with exceptional interdisciplinary teams focused on advancing AI development to solve complex global challenges and accelerate high-quality product innovation for billions of users. We use our technologies for widespread public benefit and scientific discovery, ensuring safety and ethics are always our highest priority.

We are pushing the boundaries across multiple domains. Our global teams offer diverse learning opportunities and varied career pathways for those driven to achieve exceptional results through collective effort.

Responsibilities

Build advanced classifiers and data pipelines to detect misuse, owning the end-to-end process from automated evaluation to rapid model iteration.
Build cross-context monitoring systems to detect coordinated harms, developing novel signal aggregation methods across disparate user sessions to identify large-scale attack vectors.
Implement data-driven, semi-automated account-level response systems to detect, track, and apply strikes against persistent malicious actors using rich signals from production traffic.
Evaluate and secure agentic AI systems by developing threat models, creating testing environments, and deploying robust mitigations against frontier-level agentic hacking and long-horizon attacks.
Be able to advance research in automated red-teaming and adversarial robustness, leveraging multi-turn/agentic attacks to systematically test for and uncover misuse vulnerabilities.

Qualifications

Minimum qualifications:

Bachelor’s degree or equivalent practical experience.
5 years of experience with software development in one or more programming languages.
3 years of experience testing, maintaining, or launching software products, and 1 year of experience with software design and architecture.

Preferred qualifications:

PhD in Computer Science, Machine Learning, or equivalent practical experience, or publications at venues (e.g., NeurIPS, ICLR, ICML, or EMNLP).
Experience with cybersecurity detection and response, building classifiers and anomaly detection systems at scale, taking safety defenses or mitigations from research concepts to scalable production systems.
Experience in adversarial machine learning, automated red-teaming, or model interpretability and probes.
Experience collaborating on or leading applied ML projects, including LLM training, inference, and fine-tuning.
Experience using AI coding agents with strong architectural judgment and with TPUs and JAX.
Knowledge of AI control, chain-of-thought monitoring, monitorability, and related frontier safety research.

Responsibilities

Build advanced classifiers and data pipelines to detect misuse, owning the end-to-end process from automated evaluation to rapid model iteration.
Build cross-context monitoring systems to detect coordinated harms, developing novel signal aggregation methods across disparate user sessions to identify large-scale attack vectors.
Implement data-driven, semi-automated account-level response systems to detect, track, and apply strikes against persistent malicious actors using rich signals from production traffic.
Evaluate and secure agentic AI systems by developing threat models, creating testing environments, and deploying robust mitigations against frontier-level agentic hacking and long-horizon attacks.
Be able to advance research in automated red-teaming and adversarial robustness, leveraging multi-turn/agentic attacks to systematically test for and uncover misuse vulnerabilities.

Qualifications

Minimum qualifications:

Bachelor’s degree or equivalent practical experience.
5 years of experience with software development in one or more programming languages.
3 years of experience testing, maintaining, or launching software products, and 1 year of experience with software design and architecture.

Preferred qualifications:

PhD in Computer Science, Machine Learning, or equivalent practical experience, or publications at venues (e.g., NeurIPS, ICLR, ICML, or EMNLP).
Experience with cybersecurity detection and response, building classifiers and anomaly detection systems at scale, taking safety defenses or mitigations from research concepts to scalable production systems.
Experience in adversarial machine learning, automated red-teaming, or model interpretability and probes.
Experience collaborating on or leading applied ML projects, including LLM training, inference, and fine-tuning.
Experience using AI coding agents with strong architectural judgment and with TPUs and JAX.
Knowledge of AI control, chain-of-thought monitoring, monitorability, and related frontier safety research.