Model Quality Software Engineer, Claude Code

Anthropic Anthropic · AI Frontier · San Francisco, CA · Engineering & Design - Product

Staff Software Engineer to set technical direction at the intersection of engineering and research on the Claude Code team. Architect systems, tooling, and evaluation infrastructure to measure, understand, and improve Claude's coding capabilities. Drive architecture, mentor engineers, and influence the direction of Claude Code.

What you'd actually do

  1. Set technical direction for evaluation systems, research infrastructure, and internal tooling across the Claude Code team
  2. Architect eval frameworks that measure model capabilities across diverse coding tasks and scale with our research roadmap
  3. Lead the design of infrastructure that enables researchers to run experiments at scale, and make the foundational tradeoffs that shape how the team operates for years
  4. Identify the highest-leverage engineering investments—often before anyone has asked for them—and drive them to completion
  5. Serve as a senior technical bridge between product and research, using strong product intuition to influence which capabilities we prioritize and how we measure progress against them

Skills

Required

  • Python
  • TypeScript

Nice to have

  • Designing or scaling eval/evaluation frameworks for ML systems
  • Reinforcement learning infrastructure or training systems
  • Leading technical initiatives in high-performance, demanding environments
  • Research computing, scientific infrastructure, or developer platforms at scale
  • Strong quantitative foundation (math, physics, or related fields)

What the JD emphasized

  • 10+ years of software engineering experience
  • track record of operating as a Staff or Principal engineer
  • architected and owned complex, high-stakes systems
  • setting technical direction that others follow
  • Take full ownership of ambiguous, open-ended problems
  • power user of agentic coding tools
  • deep intuition about model capabilities and limitations
  • dive into unfamiliar technical domains
  • Care deeply about correctness and reliability
  • raised engineering standards
  • Designing or scaling eval/evaluation frameworks for ML systems
  • Reinforcement learning infrastructure or training systems
  • Leading technical initiatives in high-performance, demanding environments

Other signals

  • eval frameworks
  • research infrastructure
  • technical direction
  • model capabilities
  • coding capabilities