Full Stack Software Engineer, Reinforcement Learning

Anthropic Anthropic · AI Frontier · AI Research & Engineering

Full-Stack Software Engineer to build platforms, tools, and interfaces for Reinforcement Learning environment creation, data collection, and training observability. This role supports researchers, vendors, and data labelers in generating high-quality training data for frontier models. Requires strong full-stack engineering skills and ability to build reliable products.

What you'd actually do

  1. Build and extend web platforms for RL environment creation, management, and quality review — including environment configuration, versioning, and validation workflows
  2. Develop vendor-facing interfaces and tooling that enable external partners to create, submit, and iterate on training environments with minimal friction
  3. Design and implement platforms for human data collection at scale, including labeling workflows, quality assurance systems, and feedback mechanisms
  4. Build evaluation dashboards and observability UIs that give researchers real-time insight into environment quality, training run health, and reward signal integrity
  5. Create backend services and APIs that connect environment authoring tools, data collection systems, and RL training infrastructure

Skills

Required

  • Full-stack experience
  • Python
  • Modern web frameworks (React, TypeScript, or similar)
  • Building and shipping user-facing products, internal tools, or developer platforms
  • End-to-end product ownership (backend, frontend, API design, database schema)
  • Relational databases
  • API design patterns
  • Authentication/authorization systems
  • UX focus
  • Clear communication
  • Translating ambiguous requirements
  • Building excellent platforms
  • High agency
  • Thriving in fast-paced environments
  • Interest in Anthropic's mission

Nice to have

  • Experience building data collection, labeling, or annotation platforms
  • Background building multi-tenant platforms with role-based access and vendor management workflows
  • Experience with cloud infrastructure (GCP or AWS), Docker, and CI/CD pipelines
  • Familiarity with LLM training, fine-tuning, or evaluation workflows
  • Experience with async Python frameworks (Trio, asyncio) or high-throughput API design
  • Background building dashboards, monitoring, or observability tooling
  • Experience working with external vendors or partners on technical integrations

What the JD emphasized

  • strong software engineering fundamentals with full-stack experience
  • proficient in Python and modern web frameworks (React, TypeScript, or similar)
  • experience building and shipping user-facing products, internal tools, or developer platforms
  • own a product surface end-to-end — backend, frontend, API design, database schema
  • experience with relational databases, API design patterns, and authentication/authorization systems
  • Care about UX and can build interfaces that are intuitive for both technical and non-technical users
  • Communicate clearly with researchers, operations teams, and engineers and can translate ambiguous requirements into well-scoped work
  • motivated by building excellent platforms
  • operate with high agency: you identify what needs to be done and drive it forward independently
  • Thrive in a fast-paced environment where priorities shift and new problems emerge regularly
  • Care about Anthropic's mission to build safe, beneficial AI and want your work to contribute to that goal
  • Experience building data collection, labeling, or annotation platforms
  • Background building multi-tenant platforms with role-based access and vendor management workflows
  • Experience with cloud infrastructure (GCP or AWS), Docker, and CI/CD pipelines
  • Familiarity with LLM training, fine-tuning, or evaluation workflows
  • Experience with async Python frameworks (Trio, asyncio) or high-throughput API design
  • Background building dashboards, monitoring, or observability tooling
  • Experience working with external vendors or partners on technical integrations

Other signals

  • building platforms for RL
  • data collection
  • training observability
  • full-stack engineering