Reinforcement Learning from Human Feedback: training a reward model on human preferences, then optimizing the LLM against it. The original recipe behind ChatGPT-style helpfulness tuning.
Primary AI lifecycle stage: post-training.
As of today, 43 active AI roles across 28 companies in our index reference RLHF. Hiring concentrates at the post-training (37%) and agents (26%) stages. Most common sectors: Big Tech, Consumer, AI Frontier.
Reinforcement Learning from Human Feedback: training a reward model on human preferences, then optimizing the LLM against it. The original recipe behind ChatGPT-style helpfulness tuning. Primary AI lifecycle stage: post-training.
43 active AI roles across 28 companies in our index reference RLHF as of today.
The companies with the most active RLHF listings are: Capital One (5 roles), Amazon (4 roles), Apple (3 roles), Cohere (3 roles), Pinterest (3 roles).
RLHF primarily belongs to the post-training stage of the AI lifecycle. In current hiring, RLHF roles concentrate at: post-training (37%), agents (26%).
The sectors with the most active RLHF hiring are: Big Tech, Consumer, AI Frontier.
12 AI roles tagged rlhf.
| Company | Title | Sector | AI score | Other tags |
|---|---|---|---|---|
| CrowdStrike | Director, Model Post-Training and Agentic Research (Remote) | Enterprise | 9 | RL post-training · Agent orchestration · Tool use · Evals · Reward modeling · Agent research |
| Canva | Senior Research Scientist - Reinforcement Learning, MoEs | Enterprise | 9 | RL post-training · Reward modeling · Agent orchestration · Tool use · Multimodal · Model serving · Frontier research · Evals |
| Datadog | AI Research Engineer - Datadog AI Research (DAIR) | Enterprise | 9 | Multimodal · Frontier research · RL post-training · Agent orchestration · Model serving · Inference infra · Evals · Synthetic data |
| Moveworks | Senior Machine Learning Engineer II, NLU & Agentic AI | Enterprise | 9 | Agent orchestration · Agent research · Fine-tuning · Evals · Multimodal · Model serving · LLM observability |
| Moveworks | Senior Machine Learning Engineer II, NLU & Agentic AI | Enterprise | 9 | Agent orchestration · Agent research · Fine-tuning · Evals · Multimodal · Model serving · LLM observability |
| ServiceNow | Staff Machine Learning Engineer, Agentic AI Systems - Moveworks | Enterprise | 8 | Agent orchestration · Tool use · Evals · Fine-tuning · Model serving · Agent research · LLM observability · Multimodal |
| Canva | Senior Machine Learning Engineer - Multimodal Data | Enterprise | 8 | Multimodal · Agent orchestration · Fine-tuning · Synthetic data · LLM observability |
| Handshake | AI Tutor, Electrochemistry & Functional Materials Specialist (contract), Handshake AI | Enterprise | 7 | Evals · Guardrails |
| Adobe | Senior Data Science Engineer | Enterprise | 5 | |
| Handshake | Mathematics PhDs - AI Trainer | Enterprise | 5 | Evals |
| Replit | Product Lead, Growth Marketing | Enterprise | 5 | Agent orchestration · RAG · Vector DB · Fine-tuning · Inference infra · Model serving · Recommender systems · Search & ranking · Multimodal · Evals · Guardrails · LLM observability · Frontier research · Interpretability · Synthetic data · Agent research · RL post-training · Reward modeling · RL robotics · Embodied AI |
| Handshake | Music Producer - AI Trainer | Enterprise | 5 | Evals |