Research Engineer, Tokens (pre-training)

Anthropic Anthropic · AI Frontier · AI Research & Engineering

Research Engineer focused on pretraining data for large-scale AI models. Responsibilities include understanding data trends, scaling laws, optimizing data mixes, exploring new data sources, building research tools for analysis, and effective data processing. Strong software engineering and empirical research skills are required.

What you'd actually do

  1. understanding pretraining data trends and scaling laws
  2. optimizing pretraining data mixes
  3. investigating potential new sources of data
  4. building research tools to better understand experimental results
  5. figuring out how to process and use pretraining data most effectively

Skills

Required

  • software engineering
  • empirical research
  • data analysis
  • large-scale data processing

Nice to have

  • high performance, large-scale ML systems
  • language modeling with transformers
  • large-scale ETL
  • designing ML experiments
  • researching ML fundamentals
  • inspecting and iterating on data

What the JD emphasized

  • significant software engineering experience
  • high performance, large-scale ML systems
  • language modeling with transformers
  • large-scale ETL
  • designing ML experiments and researching ML fundamentals
  • inspecting and iterating on data

Other signals

  • pretraining data research
  • scaling laws
  • optimizing pretraining data mixes
  • new sources of data
  • processing and using pretraining data