Reinforcement Learning from Human Feedback: training a reward model on human preferences, then optimizing the LLM against it. The original recipe behind ChatGPT-style helpfulness tuning. Primary AI lifecycle stage: post-training.
43 active AI roles across 28 companies in our index reference RLHF as of today.
The companies with the most active RLHF listings are: Capital One (5 roles), Amazon (4 roles), Apple (3 roles), Cohere (3 roles), Pinterest (3 roles).
RLHF primarily belongs to the post-training stage of the AI lifecycle. In current hiring, RLHF roles concentrate at: post-training (37%), agents (26%).
The sectors with the most active RLHF hiring are: Big Tech, Consumer, AI Frontier.
Reinforcement Learning from Human Feedback: training a reward model on human preferences, then optimizing the LLM against it. The original recipe behind ChatGPT-style helpfulness tuning.
Primary AI lifecycle stage: post-training.
As of today, 43 active AI roles across 28 companies in our index reference RLHF. Hiring concentrates at the post-training (37%) and agents (26%) stages. Most common sectors: Big Tech, Consumer, AI Frontier.
4 AI roles tagged rlhf.
| Company | Title | Sector | AI score | Other tags |
|---|---|---|---|---|
| Snowflake | AI Research Scientist, New Grad – Agents & Reinforcement Learning | Data AI | 9 | Agent orchestration · Agent research · Multi-agent · Fine-tuning · Synthetic data · Code gen |
| Weights & Biases | VP of Product, Research and Training Infrastructure | Data AI | 9 | Frontier research · Pretraining · RL post-training · Inference infra · Model serving |
| Scale AI | Forward Deployed Engineer, GenAI | Data AI | 7 | |
| Scale AI | Senior Software Engineer, GenAI | Data AI | 7 |