Research Engineer, Universes

Anthropic Anthropic · AI Frontier · United States · Remote · AI Research & Engineering

Research Engineer role focused on building next-generation agentic environments for training AI models. This role involves implementing novel approaches, contributing to research direction, designing training environments and methodologies, and building evaluations for capable and safe agentic AI. It blends research and engineering, with a focus on reinforcement learning and complex, long-horizon agentic tasks.

What you'd actually do

  1. Build the next generation of agentic environments
  2. Build rigorous evaluations that measure real capability
  3. Collaborate across research and infrastructure teams to ship environments into production training
  4. Debug and iterate rapidly across research and production ML stacks
  5. Contribute to research culture through technical discussions and collaborative problem-solving

Skills

Required

  • strong software engineering skills
  • build robust infrastructure
  • balance research exploration with engineering implementation
  • comfortable with uncertainty and adapt quickly

Nice to have

  • industry experience with large language model training, fine-tuning or evaluation
  • industry experience building RL environments, simulation systems, or large-scale ML infrastructure
  • Deep expertise in sandboxing, containerization, VM infrastructure, or distributed systems
  • Published influential work in relevant ML areas

What the JD emphasized

  • novel training environments
  • long-horizon agentic tasks
  • fundamental research in reinforcement learning
  • building evaluations that measure genuine capability
  • agentic environments
  • rigorous evaluations
  • real capability
  • production training
  • research and production ML stacks

Other signals

  • training AI models to perform complex, difficult, long-horizon agentic tasks
  • design and implement novel training environments
  • fundamental research in reinforcement learning
  • building evaluations that measure genuine capability