Software Engineer, RL Data

Anthropic Anthropic · AI Frontier · San Francisco, CA · AI Research & Engineering

Software Engineer on the RL Data team responsible for building systems that produce high-quality reinforcement learning data for Claude. This includes data collection pipelines, human feedback tooling, execution environments, and quality assurance. The role involves end-to-end ownership of stack components, iterating on prompts and evals, developing QA frameworks, hardening execution environments, and collaborating with domain experts and operations partners.

What you'd actually do

  1. Own significant parts of our stack end-to-end, from technical architecture through the unglamorous operational work that makes it succeed
  2. Build data collection pipelines, read the transcripts they produce, and iterate on prompts, evals, and graders until the output is good
  3. Develop and improve QA frameworks to catch reward hacking and ensure environment quality
  4. Build interfaces that make collecting human data fast and painless for the people providing it
  5. Harden execution environments — sandboxing, snapshotting, tool coverage — so tasks hold up at training scale

Skills

Required

  • Strong software engineering skills
  • proficiency in at least one modern programming language (Python, TypeScript)
  • Experience designing, building, and running backend systems or infrastructure
  • Effective use of AI tools in your own day-to-day work
  • Willingness to own problems end-to-end
  • Proactive, open communication
  • Comfort iterating quickly in ambiguous, fast-changing situations
  • Care about the societal impacts of your work

Nice to have

  • Experience building LLM-powered systems: prompt pipelines, evals, or products with models in the loop
  • Experience with reinforcement learning on LLMs: creating environments, rewards, graders, or training data
  • Time as a forward deployed engineer, founder, or early startup engineer
  • Experience shipping user-facing products, or internal platforms people love
  • Experience building data pipelines or integrations that move, transform, and index data from many sources
  • Experience building connectors or integrations with third-party tools and APIs
  • Experience with containers, Kubernetes, or simulation infrastructure
  • Experience handling sensitive data or working under tight security controls
  • Experience working with external data vendors
  • Basic familiarity with AI safety or security research

What the JD emphasized

  • Own significant parts of our stack end-to-end
  • unglamorous operational work
  • iterate on prompts, evals, and graders
  • QA frameworks
  • reward hacking
  • harden execution environments
  • sandboxing
  • snapshotting
  • tool coverage
  • training scale
  • external data vendors

Other signals

  • build data collection pipelines
  • iterate on prompts, evals, and graders
  • develop and improve QA frameworks
  • harden execution environments
  • work with external data vendors