Staff AI Research Scientist

Writer Writer · AI Frontier · San Francisco, CA · Engineering, product & design

AI research scientist focused on large language models and agentic reasoning for enterprise AI deployments. The role involves leading research agendas, designing post-training experiments (SFT, RLHF, RLAIF, DPO), building evaluation benchmarks, developing data synthesis pipelines, and shaping model architecture. The goal is to advance the field and ship research into production products used by hundreds of thousands of people.

What you'd actually do

  1. Lead an independent, high-impact research agenda on large language models and agentic systems, owning projects from early hypothesis through model training, evaluation, and production deployment
  2. Design and execute large-scale post-training experiments using supervised fine-tuning, reinforcement learning from human feedback (RLHF), RLAIF, DPO, and emerging alignment techniques — with a focus on improving multi-step reasoning, planning, and tool use in enterprise agentic workflows
  3. Build novel evaluation benchmarks and methodologies that push beyond existing limitations, establishing rigorous measures for how well models perform on complex, real-world enterprise tasks
  4. Develop scalable data synthesis and curation pipelines that generate the high-quality training signal driving model improvement — including LLM-as-judge frameworks, synthetic data generation, and adversarial dataset construction
  5. Shape WRITER's model architecture and training roadmap by translating your research insights into concrete improvements to our enterprise-grade LLMs, working hand-in-hand with research engineering and product teams

Skills

Required

  • 7+ years of hands-on ML research experience
  • deep expertise in large language model pre-training and post-training
  • trained models at scale
  • debugged distributed jobs
  • shipped improvements that made a measurable difference
  • Expert-level knowledge of post-training methods including SFT, RLHF, RLAIF, DPO, GRPO, and related alignment and reasoning techniques
  • track record of applying them to real, production-grade systems
  • Strong command of Python and PyTorch (or JAX)
  • engineering depth to build and scale training pipelines, evaluation infrastructure, and data synthesis workflows
  • meaningful publication record at competitive ML/AI venues (NeurIPS, ICLR, ICML, ACL, EMNLP, or equivalent)
  • ability to originate ideas and execute on a multi-month research agenda independently
  • Hands-on experience designing or evaluating agentic systems
  • nuanced understanding of where they break and how to fix them
  • Ph.D. in Computer Science, Machine Learning, NLP, or a related field — or equivalent demonstrated research experience with a strong portfolio of independent, published work

Nice to have

  • Connect — you collaborate openly across research, engineering, and product and communicate complex ideas with clarity to both technical and non-technical audiences
  • Challenge — you ask the hard questions, push back on conventional wisdom, and pursue the research directions others haven't tried yet
  • Own — you drive your projects end-to-end with urgency, take accountability for results, and care deeply about the impact your work has on real customers

What the JD emphasized

  • production deployment
  • post-training
  • evaluation
  • training
  • agentic systems
  • published record

Other signals

  • leading enterprise AI deployments
  • shipping directly into products
  • models are the engine making that possible
  • next generation of enterprise AI behaves, performs, and scales