Reinforcement Learning from Human Feedback: training a reward model on human preferences, then optimizing the LLM against it. The original recipe behind ChatGPT-style helpfulness tuning. Primary AI lifecycle stage: post-training.
43 active AI roles across 28 companies in our index reference RLHF as of today.
The companies with the most active RLHF listings are: Capital One (5 roles), Amazon (4 roles), Apple (3 roles), Cohere (3 roles), Pinterest (3 roles).
RLHF primarily belongs to the post-training stage of the AI lifecycle. In current hiring, RLHF roles concentrate at: post-training (37%), agents (26%).
The sectors with the most active RLHF hiring are: Big Tech, Consumer, AI Frontier.
Reinforcement Learning from Human Feedback: training a reward model on human preferences, then optimizing the LLM against it. The original recipe behind ChatGPT-style helpfulness tuning.
Primary AI lifecycle stage: post-training.
As of today, 43 active AI roles across 28 companies in our index reference RLHF. Hiring concentrates at the post-training (37%) and agents (26%) stages. Most common sectors: Big Tech, Consumer, AI Frontier.
5 AI roles tagged rlhf.
| Company | Title | Sector | AI score | Other tags |
|---|---|---|---|---|
| Verizon | Princ Engr-Technology Strategy | Telecom | 8 | Agent orchestration · RAG · LLM observability |
| Verizon | Sr Engr Cslt-Tech Strategy | Telecom | 8 | Agent orchestration · RAG · LLM observability |
| Verizon | Director of Digital Customer Experience & AI Innovation | Telecom | 7 | LLM observability · Agent orchestration · Guardrails · RAG · Vector DB · Fine-tuning · Model serving · Recommender systems · Search & ranking · Interpretability · Synthetic data · Agent research · RL post-training · Reward modeling · RL robotics · Embodied AI |
| AT&T | Senior Full Stack/AI Engineer | Telecom | 5 | LLM observability · Agent orchestration · RAG · Vector DB · Fine-tuning · Model serving · Recommender systems · Search & ranking · Vision · Multimodal · Audio & speech · Frontier research · Interpretability · Synthetic data · Agent research · RL post-training · Reward modeling · RL robotics · Embodied AI · Code gen |
| AT&T | Full-Stack Software Engineer | Telecom | 5 | Agent orchestration · Tool use · LLM observability · RAG · Vector DB · Fine-tuning · Model serving · Recommender systems · Search & ranking · Vision · Multimodal · Audio & speech · Frontier research · Interpretability · Synthetic data · Agent research · RL post-training · Reward modeling · RL robotics · Embodied AI · Code gen |