Reinforcement Learning from Human Feedback: training a reward model on human preferences, then optimizing the LLM against it. The original recipe behind ChatGPT-style helpfulness tuning. Primary AI lifecycle stage: post-training.
43 active AI roles across 28 companies in our index reference RLHF as of today.
The companies with the most active RLHF listings are: Capital One (5 roles), Amazon (4 roles), Apple (3 roles), Cohere (3 roles), Pinterest (3 roles).
RLHF primarily belongs to the post-training stage of the AI lifecycle. In current hiring, RLHF roles concentrate at: post-training (37%), agents (26%).
The sectors with the most active RLHF hiring are: Big Tech, Consumer, AI Frontier.
Reinforcement Learning from Human Feedback: training a reward model on human preferences, then optimizing the LLM against it. The original recipe behind ChatGPT-style helpfulness tuning.
Primary AI lifecycle stage: post-training.
As of today, 43 active AI roles across 28 companies in our index reference RLHF. Hiring concentrates at the post-training (37%) and agents (26%) stages. Most common sectors: Big Tech, Consumer, AI Frontier.
7 AI roles tagged rlhf.
| Company | Title | Sector | AI score | Other tags |
|---|---|---|---|---|
| Capital One | Applied Researcher II, AI Foundations | Banking | 9 | Fine-tuning · Frontier research · Interpretability · Vector DB |
| Capital One | Applied Researcher I (AI Foundations) | Banking | 9 | Pretraining · Fine-tuning · Frontier research · Vector DB |
| Capital One | Applied Researcher II | Banking | 9 | Fine-tuning · Frontier research · Vector DB · Pretraining |
| Capital One | Applied Researcher I | Banking | 8 | Fine-tuning · Frontier research · Vector DB |
| Capital One | Applied Researcher I | Banking | 8 | Fine-tuning · Frontier research · Interpretability · Vector DB · Recommender systems · Model serving |
| Capital One | Applied Researcher II (AI Foundations) | Banking | 8 | Pretraining · Fine-tuning · Vector DB |
| Capital One | Applied Researcher I (AI Foundations) | Banking | 8 | Pretraining · Fine-tuning · Vector DB · Frontier research · Interpretability |