Reinforcement Learning from Human Feedback: training a reward model on human preferences, then optimizing the LLM against it. The original recipe behind ChatGPT-style helpfulness tuning. Primary AI lifecycle stage: post-training.
43 active AI roles across 28 companies in our index reference RLHF as of today.
The companies with the most active RLHF listings are: Capital One (5 roles), Amazon (4 roles), Apple (3 roles), Cohere (3 roles), Pinterest (3 roles).
RLHF primarily belongs to the post-training stage of the AI lifecycle. In current hiring, RLHF roles concentrate at: post-training (37%), agents (26%).
The sectors with the most active RLHF hiring are: Big Tech, Consumer, AI Frontier.
Reinforcement Learning from Human Feedback: training a reward model on human preferences, then optimizing the LLM against it. The original recipe behind ChatGPT-style helpfulness tuning.
Primary AI lifecycle stage: post-training.
As of today, 43 active AI roles across 28 companies in our index reference RLHF. Hiring concentrates at the post-training (37%) and agents (26%) stages. Most common sectors: Big Tech, Consumer, AI Frontier.
8 AI roles tagged rlhf.
| Company | Title | Sector | AI score | Other tags |
|---|---|---|---|---|
| Airbnb | Senior Machine Learning Engineer, Customer Support Engineering | Consumer | 9 | Agent orchestration · Tool use · Evals · Guardrails · RAG · Fine-tuning · Model serving · Agent research |
| Uber | 2026 PhD Research Intern, India | Consumer | 9 | Fine-tuning · Evals · Agent research · Frontier research |
| Zillow | Principal Applied Scientist, Agentic AI | Consumer | 9 | RL post-training · Reward modeling · Fine-tuning · Guardrails · Agent orchestration · Evals · Multimodal · Vector DB |
| Machine Learning Engineer II, Computer Vision Applied Science | Consumer | 9 | Vision · Multimodal · Fine-tuning · Model serving · Evals | |
| Sr. Machine Learning Engineer, Applied Science | Consumer | 9 | Vision · Fine-tuning · Multimodal | |
| Staff Product Manager, AI Safety | Consumer | 8 | Evals · Guardrails · LLM observability · Multimodal · Agent research | |
| Discord | Manager, Scaled Abuse Countermeasures and Research | Consumer | 7 | Agent orchestration · Evals · Guardrails · LLM observability · RAG · Vector DB · Fine-tuning · Model serving · Recommender systems · Search & ranking · Vision · Audio & speech · Frontier research · Interpretability · Synthetic data · Agent research · RL post-training · Reward modeling · RL robotics · Embodied AI |
| Whatnot | Software Engineer, Trust & Risk | Consumer | 7 | Agent orchestration · Evals · Guardrails · LLM observability · RAG · Vector DB · Fine-tuning · Inference infra · Model serving · Recommender systems · Search & ranking · Vision · Audio & speech · Frontier research · Interpretability · Synthetic data · Agent research · RL post-training · Reward modeling · RL robotics · Embodied AI |