Research Engineer, Model Performance & Quality

Anthropic Anthropic · AI Frontier · AI Research & Engineering

Research Engineer focused on systematically understanding and monitoring model quality in real-time. This role involves training production models, developing monitoring systems, and creating novel evaluation methodologies, bridging research and production across the model training pipeline.

What you'd actually do

  1. Build comprehensive training observability systems
  2. Develop next-generation evaluation frameworks
  3. Create automated quality assessment pipelines
  4. Bridge research and production

Skills

Required

  • Python
  • production ML systems
  • training large language models
  • evaluating large language models
  • monitoring large language models
  • debugging complex, distributed systems
  • analytical skills
  • interpreting training metrics
  • interpreting model behavior

Nice to have

  • reinforcement learning
  • language model training pipelines
  • designing and implementing evaluation frameworks or benchmarks
  • production monitoring
  • observability
  • incident response
  • statistical analysis
  • experimental design
  • AI safety and alignment research

What the JD emphasized

  • systematically understanding and monitoring model quality in real-time
  • train production models
  • develop robust monitoring systems
  • create novel evaluation methodologies
  • monitoring infrastructure
  • evaluation techniques
  • monitoring infrastructure scales
  • model training pipeline
  • measuring and monitoring capabilities

Other signals

  • training observability systems
  • next-generation evaluation frameworks
  • automated quality assessment pipelines
  • bridge research and production
  • model quality assessment