Research Scientist, Agent Post-training, Deepmind

Google Google · Big Tech · London, United Kingdom

Research Scientist at DeepMind focused on agent post-training for large language models, specifically within the Gemini model recipe. The role involves driving research, designing experiments, developing research infrastructure, and collaborating with other scientists and engineers. Requires a PhD or equivalent experience in ML/CS, with experience in ML frameworks and research in RL, tool use, or agentic systems.

What you'd actually do

  1. Drive the research process for large-scale agent post-training from hypothesis formulation to delivery in the Gemini model recipe.
  2. Design and execute ablation studies to validate research hypotheses and accelerate experimental feedback loops.
  3. Communicate research findings, progress, and outcomes to the broader team through visualizations and reports.
  4. Develop research infrastructure and utilities for data analysis and model evaluations using standard engineering practices.
  5. Collaborate with other research scientists and engineers to maintain a regular feedback and communication loop.

Skills

Required

  • PhD in Computer Science, Machine Learning, or a related quantitative field, or equivalent practical experience
  • 2 years of experience with machine learning frameworks such as JAX, Flax, or PyTorch
  • Experience conducting research in reinforcement learning, tool use, or agentic systems

Nice to have

  • Experience publishing research in reinforcement learning, reinforcement learning from human feedback, or tool-use algorithms at machine learning venues
  • Experience working with large-scale distributed training infrastructure and scaling experiments
  • Research experience in systems design, code complexity, or working with large-codebase environments
  • Experience developing simple, scalable solutions for complex, open-ended research problems

What the JD emphasized

  • agent post-training
  • reinforcement learning
  • tool use
  • agentic systems
  • large-scale agent post-training
  • reinforcement learning
  • reinforcement learning from human feedback
  • tool-use algorithms

Other signals

  • agent post-training
  • Gemini model recipe
  • reinforcement learning
  • tool use
  • agentic systems