Research Engineer, Search and Knowledge Post-training

Anthropic Anthropic · AI Frontier · United States · Remote · AI Research & Engineering

Research Engineer focused on advancing search and knowledge capabilities in LLMs through post-training techniques. The role involves defining research hypotheses, designing experiments, building instrumentation for controlled studies, developing evaluations to distinguish reasoning from pattern matching, and driving optimization rigor. It sits at the intersection of RL, retrieval, and evaluation, aiming to make LLMs trustworthy searchers.

What you'd actually do

  1. Own a research direction for a class of search post-training problems end-to-end: form hypotheses about latent capabilities, design experiments that isolate them, run training, and decide what to try next.
  2. Build the instrumentation that turns environment design into a controlled experiment so we can study how each environment factor contributes to the capabilities we care about, rather than overfitting to any one regime.
  3. Design frontier-discriminating evaluations that distinguish genuine reasoning over evidence from plausible pattern matching and that hold up as models improve.
  4. Drive optimization rigor across the stack: efficient experiment design, ablations, training run economics, and the discipline to know when a result is real.
  5. Collaborate deeply with researchers across post-training, RL infrastructure, and product to translate model behavior in the wild into concrete training signals and back again.

Skills

Required

  • unusually rigorous, quantitative mindset
  • outstanding software engineer in Python
  • comfortable across the stack from data pipelines to RL training to evaluation infrastructure
  • shipped real ML research repeatedly
  • taste for which experiments are worth running
  • instinctively reach for ablations, controls, and confidence intervals to understand why
  • operate well with high autonomy and ambiguity
  • identify the most impactful problem to work on next without being told
  • set research direction
  • advocate for experimental rigor
  • raise the bar for the people around you
  • communicate research clearly in writing and in person
  • defend a design choice
  • update on evidence

Nice to have

  • Hands-on experience with RL on large language models — environments, reward design, training stability, scaling behavior.
  • Background in search, retrieval, RAG, or agents that reason over external information sources.
  • Experience building evaluations for open-ended or knowledge-intensive LLM behavior
  • Prior work in a research-heavy environment — frontier AI lab, quant research firm, or similarly demanding empirical setting — where rigor is the default.
  • Published research on LLMs, RL, retrieval, calibration, or related topics.
  • Experience with distributed training systems and large-scale experimentation infrastructure.

What the JD emphasized

  • unusually rigorous, quantitative mindset
  • shipped real ML research repeatedly
  • instinctively reach for ablations, controls, and confidence intervals
  • set research direction
  • experimental rigor
  • rigor is the default

Other signals

  • research direction
  • experimental rigor
  • quantitative mindset
  • ML research
  • RL
  • search
  • retrieval
  • evaluation