13 AI roles tagged rlhf.
| Company | Title | Sector | AI score | Other tags |
|---|---|---|---|---|
| Anthropic | Anthropic AI Safety Fellow, UK | AI Frontier | 10 | Frontier research · Interpretability · Evals · Guardrails |
| xAI | Member of Technical Staff - Post-Training and RL | AI Frontier | 9 | RL post-training · Reward modeling · Fine-tuning |
| Anthropic | Anthropic Fellows Program — Reinforcement Learning | AI Frontier | 9 | RL post-training |
| Character AI | Research Engineer, Multimodal | AI Frontier | 9 | Fine-tuning · Multimodal · Vision · Audio & speech · Model serving · Inference infra · Synthetic data |
| Cohere | Research Engineer | AI Frontier | 9 | Frontier research · Fine-tuning · Evals · Model serving · Agent orchestration |
| Anthropic | Research Manager, Production Model Training | AI Frontier | 9 | Fine-tuning · Evals |
| Anthropic | Data Operations Manager, Human Data | AI Frontier | 8 | Synthetic data · Evals |
| xAI | Model Behavior Tutor - Style, Taste & Aesthetics | AI Frontier | 7 | Fine-tuning |
| xAI | Model Behavior Tutor - Wit & Conversation | AI Frontier | 7 | Evals · Fine-tuning |
| Anthropic | Data Operations Manager | AI Frontier | 7 | Agent orchestration · Tool use · Synthetic data |
| Cohere | Data Annotation Specialist, Modern Standard Arabic (MSA) | AI Frontier | 5 | Reward modeling |
| Mistral AI | Data Annotation Quality Specialist | AI Frontier | 5 | Synthetic data |
| Cohere | Data Annotation Specialist, Simplified Chinese / Mandarin | AI Frontier | 5 | Synthetic data · Reward modeling · Evals |
Reinforcement Learning from Human Feedback: training a reward model on human preferences, then optimizing the LLM against it. The original recipe behind ChatGPT-style helpfulness tuning.
Primary AI lifecycle stage: post-training.
As of today, 43 active AI roles across 28 companies in our index reference RLHF. Hiring concentrates at the post-training (37%) and agents (26%) stages. Most common sectors: Big Tech, Consumer, AI Frontier.
Reinforcement Learning from Human Feedback: training a reward model on human preferences, then optimizing the LLM against it. The original recipe behind ChatGPT-style helpfulness tuning. Primary AI lifecycle stage: post-training.
43 active AI roles across 28 companies in our index reference RLHF as of today.
The companies with the most active RLHF listings are: Capital One (5 roles), Amazon (4 roles), Apple (3 roles), Cohere (3 roles), Pinterest (3 roles).
RLHF primarily belongs to the post-training stage of the AI lifecycle. In current hiring, RLHF roles concentrate at: post-training (37%), agents (26%).
The sectors with the most active RLHF hiring are: Big Tech, Consumer, AI Frontier.