Research Engineer, Data Ingestion

Anthropic Anthropic · AI Frontier · AI Research & Engineering

Research Engineer role focused on building and scaling a large-scale web crawler for data ingestion, with a focus on data quality evaluation and improvement to support the creation of pretrained models.

What you'd actually do

  1. Develop and maintain our large-scale web crawler
  2. Design and run experiments to evaluate data quality, extraction methods, and crawling strategies
  3. Analyze crawled data to identify patterns, gaps, and opportunities for improvement
  4. Build pipelines for data ingestion, analysis, and quality improvement
  5. Build specialized crawlers for high-value data sources

Skills

Required

  • web crawlers
  • large-scale data acquisition systems
  • data research
  • designing experiments
  • analyzing results
  • hybrid research-engineering role

Nice to have

  • Bachelor's degree in a related field or equivalent experience

What the JD emphasized

  • Successfully scaling our data corpus is critical to our continued efforts at producing the best pretrained models

Other signals

  • acquiring all of the available data on the internet through a large scale web crawler
  • best pretrained models that we can produce, which in turn rely on having the best pretraining data
  • build and scale our crawler infrastructure while also conducting experiments to evaluate and improve data quality
  • Successfully scaling our data corpus is critical to our continued efforts at producing the best pretrained models