Research Engineer / Scientist, Tool Use

Anthropic Anthropic · AI Frontier · AI Research & Engineering

Research Engineer/Scientist focused on advancing the frontier of tool use for AI agents, aiming to improve accuracy, reliability, safety, and efficiency in complex workflows. The role involves defining research agendas, designing RL methodologies, building evaluations, and shipping research advances into production models, with a strong emphasis on safety and collaboration.

What you'd actually do

  1. Define and pursue research agendas that push the boundaries of what's possible
  2. Design and implement novel reinforcement learning environments and methodologies that push the state of the art of tool use
  3. Build rigorous, realistic evaluations that capture the complexity of real-world tool use
  4. Ship research advances that directly impact millions of users
  5. Collaborate with other frontier research and product teams to drive fundamental breakthroughs in capabilities and safety, and work with teams to ship these into production

Skills

Required

  • Machine learning research/applied-research experience
  • Strong quantitative background (physics, mathematics, or quant research)
  • Clean, reliable code
  • Solid software engineering skills
  • Communicate complex ideas clearly

Nice to have

  • Reinforcement learning techniques and environments
  • Language model training, fine-tuning or evaluation
  • Building AI agents or autonomous systems
  • Published influential work in relevant ML areas
  • Deep expertise in a specific area (e.g., exceptional RL research, systems engineering, or mathematical foundations)
  • Shipping features or working closely with product teams
  • Pair programming and collaborative research

What the JD emphasized

  • tool use accuracy
  • long horizon & complex tool use workflow
  • large scale & dynamic tools
  • tool hallucination
  • tool use safety
  • tool use efficiency
  • reliably orchestrate vast tool ecosystems
  • maintain safety in autonomous operations
  • scale to handle the increasing complexity of real-world tasks
  • ship in production models
  • rigorous, realistic evaluations
  • ship research advances
  • drive fundamental breakthroughs
  • work with teams to ship these into production
  • Design, implement, and debug code across our research and production ML stacks

Other signals

  • tool use
  • agentic applications
  • reinforcement learning
  • production models
  • safety