Learning a scalar reward function — often from human or AI preference data — that scores LLM outputs during reinforcement-learning fine-tuning.
Primary AI lifecycle stage: post-training.
As of today, 31 active AI roles across 19 companies in our index reference Reward modeling. Hiring concentrates at the post-training (55%) and agents (26%) stages. Most common sectors: Big Tech, AI Frontier, Enterprise. New postings rose 36% in the last 30 days versus the prior 30 (11 → 15).
Learning a scalar reward function — often from human or AI preference data — that scores LLM outputs during reinforcement-learning fine-tuning. Primary AI lifecycle stage: post-training.
31 active AI roles across 19 companies in our index reference Reward modeling as of today. New postings rose 36% in the last 30 days versus the prior 30 (11 → 15).
The companies with the most active Reward modeling listings are: Amazon (9 roles), Adobe (2 roles), Cohere (2 roles), Deloitte (2 roles), OpenAI (2 roles).
Reward modeling primarily belongs to the post-training stage of the AI lifecycle. In current hiring, Reward modeling roles concentrate at: post-training (55%), agents (26%).
The sectors with the most active Reward modeling hiring are: Big Tech, AI Frontier, Enterprise.
7 AI roles tagged reward_modeling.
| Company | Title | Sector | AI score | Other tags |
|---|---|---|---|---|
| Adobe | Staff Agentic ML Engineer - Photoshop | Enterprise | 9 | Agent orchestration · Tool use · Evals · Fine-tuning · Model serving · Multimodal · Vision · RL post-training |
| CrowdStrike | Director, Model Post-Training and Agentic Research (Remote) | Enterprise | 9 | RL post-training · Agent orchestration · Tool use · Evals · RLHF · Agent research |
| Canva | Senior Research Scientist - Reinforcement Learning, MoEs | Enterprise | 9 | RL post-training · RLHF · Agent orchestration · Tool use · Multimodal · Model serving · Frontier research · Evals |
| Canva | Senior Research Scientist - Reinforcement Learning, MoEs | Enterprise | 9 | RL post-training · Frontier research · Agent orchestration · Multimodal · Model serving · Fine-tuning · Evals · Agent research |
| Adobe | Senior Applied Scientist | Enterprise | 8 | Fine-tuning · RL post-training · Model serving · Multimodal · Vision |
| Adobe | Principal Product Manager, Research and AI - Data | Enterprise | 7 | Synthetic data · Fine-tuning · Evals · Multimodal |
| Replit | Product Lead, Growth Marketing | Enterprise | 5 | Agent orchestration · RAG · Vector DB · Fine-tuning · Inference infra · Model serving · Recommender systems · Search & ranking · Multimodal · Evals · Guardrails · LLM observability · Frontier research · Interpretability · Synthetic data · Agent research · RL post-training · RLHF · RL robotics · Embodied AI |