Data AI · Data labeling
Currently tracking 82 active AI roles, up 61% versus the prior 4 weeks. Primary focus: Agent · Engineering. Salary range $139k–$393k (avg $256k).
| Title | Stage | AI score |
|---|---|---|
| Machine Learning Engineer, Global Public Sector Scale AI is hiring ML Research Engineers to bridge the gap between frontier research and real-world impact for global governments. The role involves leading research into Agent design, Deep Research, and AI Safety/reliability, developing novel methodologies for public sector applications and setting new standards across the organization. Responsibilities include pioneering novel architectures, leading AI safety initiatives, driving deep research capabilities, publishing, consulting, and building evaluation frontiers. | AgentPost-train | 10 |
| Technical Lead Manager, Physical AI Scale AI is seeking a Technical Lead Manager for their Physical AI team to lead research engineers in developing and evaluating Large-Scale Foundation Models for robots and AVs. The role involves hands-on contributions to model scaling, VLA/world model development, and data strategy, alongside team mentorship and translating research into production-ready features. |
| PretrainAgent |
| 9 |
| Director, Enterprise Machine Learning & Research Director of Enterprise ML at Scale AI, leading research scientists and engineers in GenAI initiatives. The role involves defining and driving a multi-year research roadmap, collaborating cross-functionally, and communicating research outcomes. Focus is on turning research into production-ready systems, with experience in evaluation, post-training, agents, and RL environments. Requires strong research background, publications, and team leadership experience. | Post-trainAgent | 9 |
| Research Scientist, Frontier Risk Evaluations Research Scientist role focused on designing and building evaluation measures, harnesses, and datasets for frontier AI systems, with a focus on identifying and mitigating risks. The role involves collaboration with external agencies and publishing findings, bridging AI research and policy. | Eval GateAgent | 9 |
| Research Scientist, Agent Robustness Research Scientist focused on agent robustness, AI safety, and risk evaluations. The role involves researching AI agent capabilities, designing tests for harmful actions, creating exploits and mitigations for failure modes, and characterizing risks in multi-agent systems. Experience with post-training techniques like RLHF and published research in generative AI is required. | AgentEval Gate | 9 |
| Research Scientist, AI Controls and Monitoring Research Scientist role focused on designing methods, systems, and experiments for AI controls and monitoring, ensuring advanced AI models and agents remain aligned with intended goals, even in high-stakes or adversarial environments. This includes developing monitoring techniques, researching layered control mechanisms, designing red-team simulations, and collaborating with policymakers. | Eval GatePost-train | 9 |
| Machine Learning Fellow - Human Frontier Collective (Canada) This role is for a Machine Learning Fellow focused on evaluating, interpreting, and optimizing advanced generative AI systems. The fellow will engage in ML projects, contribute to research publications, and collaborate with AI labs and platforms. The role requires a PhD or postdoctoral degree in a related field and experience with Python and ML frameworks. | Post-train | 9 |
| Manager, Machine Learning Research Scientist, GenAI Manager for a GenAI research team focused on evaluation, post-training, agents, and RL environments. The role involves leading a team, defining research roadmaps, driving execution, and collaborating cross-functionally. Requires a strong research background with publications and experience in fast-paced environments. | Post-trainAgent | 9 |
| Staff Machine Learning Research Scientist, LLM Evals Scale AI is seeking a Staff Machine Learning Research Scientist to lead the development of novel evaluation methodologies, metrics, and benchmarks for large language models (LLMs). This role focuses on defining and measuring the capabilities and limitations of frontier LLMs, driving research that informs internal roadmaps and the broader community. Responsibilities include researching existing evaluation techniques, designing new benchmarks, implementing scalable evaluation pipelines, publishing findings, and mentoring junior researchers. The ideal candidate has 5+ years of experience in LLMs/NLP, a strong publication record, and experience leading research teams. | Eval GatePost-train | 9 |
| Staff Machine Learning Research Engineer, Agent Post-training - Enterprise GenAI Scale AI is seeking a Staff Machine Learning Research Engineer focused on post-training algorithms for complex agents in enterprise GenAI applications. The role involves building a next-generation Agent RL training platform, integrating cutting-edge research, and training state-of-the-art models for enterprise customers, including cybersecurity and healthtech use cases. Experience with LLM training, post-training methods like RLHF/RLVR, and publications in top conferences are required. | Post-trainAgent | 9 |
| Machine Learning Systems Research Engineer, Agent Post-training - Enterprise GenAI Scale AI is seeking an ML Systems Research Engineer to work on building algorithms for their next-gen Agent RL training platform, supporting large-scale training, and researching/integrating state-of-the-art technologies to optimize ML systems. The role involves post-training state-of-the-art models for enterprise engagements and creating next-gen agent training algorithms for multi-agent/multi-tool rollouts. | Post-trainAgent | 9 |
| Machine Learning Research Engineer, Agents - Enterprise GenAI Research Engineer focused on building and training advanced AI agents for enterprise GenAI applications, utilizing post-training and agent-building algorithms on real-world datasets to achieve state-of-the-art results. | AgentPost-train | 9 |
| Machine Learning Research Engineer, Agent Data Foundation - Enterprise GenAI This role focuses on researching and building synthetic data pipelines and agents to improve enterprise GenAI models. It involves creating agents for trace analysis, contributing to an agent-building framework, and training state-of-the-art models using post-training and agent-building algorithms. | Post-trainAgent | 9 |
| Machine Learning Research Scientist, Reasoning Machine Learning Research Scientist focused on reasoning in LLMs, specifically for agentic systems like browser and software engineering agents. The role involves studying critical data types, identifying effective data sources and methodologies to improve LLM reasoning, and contributing to research while collaborating with engineering teams to implement solutions. | AgentPost-train | 9 |
| Machine Learning Research Scientist, Post-Training Research Scientist focused on LLM post-training techniques (SFT, RLHF, reward modeling) to enhance text and multimodal capabilities. Involves optimizing data curation, analyzing model behavior, and publishing findings. | Post-train | 9 |
| Senior / Staff Machine Learning Research Scientist, Agents Research Scientist role focused on building state-of-the-art AI agents, studying essential data types for agents like browser and SWE agents, and guiding data strategy to advance intelligent, adaptable AI agents. The role involves contributing to research publications, collaborating with customer researchers, and translating advancements into scalable solutions. | Agent | 9 |
| Tech Lead/Manager, Machine Learning Research Scientist- LLM Evals Scale AI is seeking a Tech Lead/Manager for their LLM Evals Research team. This role involves leading a team to develop and implement novel evaluation methodologies, metrics, and benchmarks for large language models, focusing on areas like instruction following, factuality, robustness, and fairness. The position requires research into LLM evaluation techniques, communication with clients and internal teams, implementation of scalable evaluation pipelines, and publishing research findings. The ideal candidate has extensive experience in LLMs, NLP, and Transformer modeling, with a proven track record of research impact and team leadership. | Eval GatePost-train | 9 |
| SWE Fellow - Human Frontier Collective (Canada) This role focuses on evaluating and interpreting advanced generative AI systems, designing datasets for rigorous evaluation, and contributing to research publications. It involves collaborating with AI labs and platforms to enhance model accuracy and reasoning, positioning it as a key player in the AI evaluation gate. | Eval Gate | 8 |
| SWE Fellow - Human Frontier Collective (UK) This role is for a Software Engineer Fellow focused on evaluating and interpreting advanced generative AI systems within the Human Frontier Collective program. The fellow will design datasets, evaluate AI models, and contribute to research publications, aiming to enhance AI accuracy and reasoning. | Eval Gate | 8 |
| SWE Fellow - Human Frontier Collective (US) This role focuses on evaluating advanced generative AI systems by designing domain-specific problems and datasets, providing expert insights to enhance model performance, and contributing to research publications. It's a research-oriented role within a fellowship program aimed at shaping the future of AI. | Eval Gate | 8 |
| Machine Learning Fellow - Human Frontier Collective (UK) This role is for a Machine Learning Fellow focused on designing, evaluating, and interpreting advanced generative AI systems. The fellow will work on ML projects, contribute to research publications, and engage with a community of AI researchers. The role involves optimizing PyTorch models, evaluating ML code, advising on GPU optimization, and collaborating on research papers. | Post-train | 8 |
| Machine Learning Fellow - Human Frontier Collective (US) This role involves applying academic and professional expertise to design, evaluate, and interpret advanced generative AI systems. The fellow will work on ML projects, optimize PyTorch models, evaluate ML code, advise on GPU optimization, and contribute to research publications and technical reports. | Post-train | 8 |
| STEM Fellow - Human Frontier Collective (UK) This role focuses on evaluating and interpreting advanced generative AI systems by designing domain-specific problems and datasets, providing expert insights to enhance model performance, and contributing to research publications. It's a research-oriented fellowship with a focus on AI evaluation. | Eval Gate | 8 |
| STEM Fellow - Human Frontier Collective (US) This role focuses on evaluating and interpreting advanced generative AI systems by designing domain-specific problems and datasets, providing expert insights to enhance model performance, and contributing to research publications. It involves collaboration with AI labs and Scale's research team. | Eval GatePost-train | 8 |
| AI Strategy Consultant, Frontier Tech This role focuses on designing and executing research experiments, building and evaluating frontier LLM datasets, and developing training/testing material to improve the quality of AI products. It involves close collaboration with ML research scientists and SPM teams, with a strong emphasis on analytical and problem-solving skills in a fast-paced environment. | DataEval Gate | 8 |
| Medical Fellow - Human Frontier Collective (UK) This role involves a medical professional (MD, DO) collaborating with AI labs to evaluate generative AI systems, focusing on safety, accuracy, and reasoning frameworks within healthcare. The goal is to apply clinical expertise to shape AI decision-making and contribute to research publications. The role is a 6-month independent contractor position with potential for extension. | Eval GatePost-train | 7 |
| Technical Advisor Specialist (Part-Time Internship) Internship role for university students to contribute to generative AI projects, focusing on training models for complex reasoning and identifying failure modes. Involves participation in focus groups and team-based projects, with flexible, remote work. | Data | 7 |
| Medical Fellow - Human Frontier Collective (US) This role is for a Medical Fellow to collaborate on high-impact projects evaluating and interpreting advanced generative AI systems in healthcare. The fellow will design clinical scenarios, evaluate model safety and accuracy, shape reasoning frameworks, and contribute to research publications. Prior AI experience is not required but is a plus. | Eval Gate | 7 |
| Legal Fellow - Human Frontier Collective (US) This role is for a Legal Fellow at Scale AI's Human Frontier Collective, focusing on evaluating and interpreting advanced generative AI systems within legal domains. The fellow will collaborate on high-impact projects, design real-world scenarios, assess model outputs for accuracy and regulatory alignment, and provide feedback for improvement. The role also involves contributing to research publications and engaging with a network of AI researchers and domain experts. Prior AI experience is not required but is a plus. | — | 5 |
| Finance Fellow - Human Frontier Collective (US) This role is for a Finance Fellow within Scale AI's Human Frontier Collective. The fellowship focuses on applying financial expertise to evaluate, interpret, and improve advanced generative AI systems in financial contexts like valuation, forecasting, and risk assessment. The fellow will collaborate with AI labs, contribute to research publications, and work within a network of AI researchers and domain experts. While the role involves working with AI systems and contributing to AI research, the core craft is finance, not building AI models. | — | 5 |
| Legal Fellow - Human Frontier Collective (UK) This role is for a Legal Fellow at Scale AI's Human Frontier Collective, focusing on evaluating and interpreting advanced generative AI systems from a legal and regulatory perspective. The fellow will collaborate on high-impact projects with AI labs, design real-world scenarios, evaluate model outputs for accuracy and regulatory alignment, and provide feedback to improve performance in areas like compliance, litigation, and contract analysis. The role also involves contributing to research publications and engaging with a network of AI researchers and domain experts. Prior experience with AI is not required but is a plus. | — | 0 |