270 AI roles tagged rl_post_training.
| Company | Title | Sector | AI score | Other tags |
|---|---|---|---|---|
| NVIDIA | Applied Deep Learning PhD Research Intern, Reinforcement Learning for LLMs - Fall 2026 | Semiconductors | 10 | Frontier research · Agent research · LLM observability · Fine-tuning |
| Cohere | Research Internship Reinforcement Learning (Summer) | AI Frontier | 10 | Fine-tuning · Agent research · Code gen · Agent orchestration · Frontier research |
| Figure AI | Helix AI Engineer, Pretraining | Robotics | 10 | Pretraining · Frontier research · Multimodal · Embodied AI · Model serving · Fine-tuning |
| MongoDB | Senior Research Scientist | Enterprise | 10 | Frontier research · Agent research · Code gen · Agent orchestration |
| Together AI | Research Engineer, Core ML | Data AI | 10 | Inference infra · Model serving · Frontier research |
| Anthropic | Research Engineer, Machine Learning (Reinforcement Learning) | AI Frontier | 10 | Agent orchestration · Tool use · Evals · Frontier research · Code gen |
| Cursor | Research Scientist | Coding AI | 10 | Agent research · Frontier research · Code gen · Evals |
| OpenAI | Researcher, Synthetic RL | AI Frontier | 10 | Synthetic data · Frontier research · Agent research |
| Anthropic | Anthropic AI Safety Fellow, US | AI Frontier | 10 | Frontier research · Interpretability · Evals · Guardrails |
| OpenAI | Research Engineer/Research Scientist, RL/Reasoning | AI Frontier | 10 | Agent research · Frontier research · Model serving · Agent orchestration |
| OpenAI | Research Engineer, Frontier Evals & Environments | AI Frontier | 10 | Evals · RL robotics · Agent research · Frontier research · LLM observability |
| Anthropic | Research Engineer, Machine Learning (Reinforcement Learning) | AI Frontier | 10 | Agent orchestration · Tool use · Agent research · Frontier research · Code gen · Inference infra · Model serving |
| Scale AI | Machine Learning Engineer, Global Public Sector | Data AI | 10 | Agent orchestration · Agent research · Frontier research · Guardrails |
| Autodesk | Research Lead / Principal Scientist & Manager Post-Training · Alignment · Reinforcement Learning Autodesk AI Lab: Toronto · Remote (CA) | Enterprise | 9 | Agent research · Agent orchestration · Evals · LLM observability |
| NVIDIA | Senior Research Scientist, Post-Training LLM and DLM | Semiconductors | 9 | Fine-tuning · Model serving · Inference infra · Evals |
| Anthropic | Technical Program Manager, Discovery | AI Frontier | 9 | |
| Together AI | Forward Deployed Engineer (Inference & Post-Training) | Data AI | 9 | Inference infra · Model serving · Fine-tuning · Quantization |
| Snorkel AI | AI Advocate, Open-Source & Research | Data AI | 9 | Evals · Fine-tuning · Agent research · Agent orchestration · Tool use · Frontier research · LLM observability |
| Visa | Senior AI Engineer | Fintech | 9 | Agent orchestration · Tool use · Guardrails · LLM observability · RAG · Fine-tuning · Inference infra · Model serving · Frontier research · Interpretability · Agent research · Multimodal |
| Mistral AI | Model Behavior Architect | AI Frontier | 9 | Evals · Guardrails · LLM observability · Agent orchestration · Tool use · Fine-tuning |
| Cognition | Research, Post-Training | Coding AI | 9 | RLHF · Reward modeling · Evals · Agent research · Agent orchestration |
| Anthropic | Research Engineer, Search and Knowledge Post-Training | AI Frontier | 9 | Evals · Search & ranking · RAG · Agent research · Frontier research · LLM observability |
| OpenAI | Researcher, Alignment Training | AI Frontier | 9 | Synthetic data · Evals · Frontier research · Interpretability |
| Cursor | Product Manager, Agent Harness | Coding AI | 9 | Agent orchestration · Agent research · Evals · Guardrails · LLM observability · Tool use · Multi-agent |
| Expedia | Machine Learning Engineer III (Gen AI & Multi-Agentic Systems) | Hospitality | 9 | Agent orchestration · Fine-tuning · RAG · Vector DB · Multimodal · Inference infra · Model serving · LLM observability · Evals · Guardrails · Code gen |
| Expedia | Senior Machine Learning Engineer (Gen AI & Multi-Agentic Systems) | Hospitality | 9 | Agent orchestration · RAG · Vector DB · Fine-tuning · Inference infra · Model serving · Multimodal · Vision · Audio & speech · Code gen · Evals · Guardrails · LLM observability |
| Anthropic | Technical Program Manager, Research | AI Frontier | 9 | Evals · Model serving |
| Intel | AI Software Engineer Intern | Semiconductors | 9 | Multimodal · Embodied AI · Fine-tuning · Model serving · Quantization · Inference infra |
| xAI | Member of Technical Staff - Post-Training and RL | AI Frontier | 9 | RLHF · Reward modeling · Fine-tuning |
| OpenAI | Researcher, Alignment Science | AI Frontier | 9 | Evals · LLM observability · Guardrails · Interpretability |