Post-training Applied Researcher

Baseten · Data AI · San Francisco, CA · EPD

Post-training researcher focused on fine-tuning open-source LLMs for specific customer tasks using RL and reward engineering. Involves building training pipelines, environments, and evals, and working with customer data to improve models that reach millions of users.

What you'd actually do

  1. Design and run post-training pipelines: SFT, GRPO, DPO, RLVR, reward function engineering, and synthetic data generation.
  2. Build task-specific training environments and evals tailored to customer domains like healthcare, code generation, and legal, spanning multi-turn tool use, sandboxed execution, and agentic workflows.
  3. Work directly with customers to translate production data into training signal, designing reward loops from real usage patterns and handling distribution shift.
  4. Run and analyze training experiments end-to-end: diagnose reward hacking, importance sampling drift, and advantage estimation instabilities.
  5. Publish findings at top venues and contribute to Baseten's open-source training libraries.

Skills

Required

  • LLM fine-tuning
  • Reinforcement learning for LLMs
  • GRPO or PPO
  • Reward engineering
  • Multi-turn agent environments
  • Tool use
  • Dataset construction
  • Model evaluation
  • Model deployment
  • Production ML systems

Nice to have

  • RL training frameworks
  • Publications at NeurIPS, ICML, ICLR
  • RL for LLMs
  • Reward modeling
  • Alignment

What the JD emphasized

  • Hands-on experience training LLMs with reinforcement learning
  • Strong intuition for reward engineering
  • Experience building multi-turn agent environments with tool use
  • Comfort working across the full pipeline from dataset construction through training, evaluation, and deployment
  • Experience with production ML systems
  • closed a training–inference loop where production data feeds back into model improvement

Other signals

  • post-training LLMs
  • customer-specific tasks
  • reward functions
  • training pipelines
  • shipping models to production