Research Engineer, Computer Use

Anthropic Anthropic · AI Frontier · New York, NY +2 · AI Research & Engineering

Research Engineer focused on teaching AI models (Claude) to perceive, use, and understand computer interfaces, enabling them to reliably and safely operate real software. This involves designing experiments, developing evaluation frameworks, building RL training environments, and collaborating with training and product teams to integrate research advances into production.

What you'd actually do

  1. Design and run experiments to improve Claude's perception and agentic capabilities
  2. Develop robust, reliable evaluation frameworks for measuring our models' ability to complete complex computer tasks
  3. Build and improve computer use and vision reinforcement learning training environments
  4. Create pipelines and tools to test and validate complex RL environments
  5. Collaborate with teams across the model training and infrastructure stack to improve our production training setup

Skills

Required

  • Software engineering experience and proficiency in Python
  • Experience training, fine-tuning, or evaluating machine learning models
  • Strong communication skills and a collaborative working style
  • Care about the societal impacts and safety of your work

Nice to have

  • Experience training models for computer use or other agentic capabilities
  • Experience with reinforcement learning, particularly in long-horizon or sparse-reward settings
  • Familiarity with multimodal model training
  • Experience building evaluations or benchmarks for agentic systems
  • Experience building reinforcement learning environments, simulation systems, or large-scale ML infrastructure
  • Experience working closely with product teams to drive model improvements

What the JD emphasized

  • reliable and safe operate real software
  • robust, reliable evaluation frameworks
  • reinforcement learning training environments
  • test and validate complex RL environments
  • production training setup
  • computer use or other agentic capabilities
  • long-horizon or sparse-reward settings
  • multimodal model training
  • evaluations or benchmarks for agentic systems
  • reinforcement learning environments, simulation systems, or large-scale ML infrastructure

Other signals

  • teaching models to use computer interfaces
  • improving models' ability to operate real software
  • advancing models' perception and agentic capabilities
  • developing evaluation frameworks for complex computer tasks
  • building and improving computer use and vision reinforcement learning training environments