Member of Technical Staff - Post-training and RL

xAI xAI · AI Frontier · Palo Alto, CA · Model

This role focuses on critical post-training and reinforcement learning challenges for AI models, including reward modeling, preference optimization (RLHF/DPO), and RL for improving reasoning, truthfulness, and real-world capabilities. The goal is to build useful models through these techniques.

What you'd actually do

  1. You will work on the most critical post-training and reinforcement learning challenges at any given time — including reward modeling, preference optimization (RLHF/DPO), and RL for improving reasoning, truthfulness, and real-world capabilities.
  2. You will get clarity on your first project before an offer.

Skills

Required

  • post-training
  • reinforcement learning
  • reward modeling
  • preference optimization
  • RLHF
  • DPO
  • truthfulness
  • real-world capabilities
  • AI models
  • reinforcement learning
  • alignment methods

Nice to have

  • trained models used by millions of people

What the JD emphasized

  • post-training
  • reinforcement learning
  • reward modeling
  • preference optimization
  • RLHF
  • DPO
  • truthfulness
  • real-world capabilities
  • building incredibly useful models through post-training and RL techniques
  • push the boundaries of what’s possible with reinforcement learning and alignment methods

Other signals

  • post-training
  • reinforcement learning
  • reward modeling
  • preference optimization
  • RLHF
  • DPO
  • truthfulness
  • real-world capabilities