Currently tracking 194 active AI roles, up 94% versus the prior 4 weeks. Primary focus: Agent · Engineering. Salary range $120k–$487k (avg $234k).
| Title | Stage | AI score |
|---|---|---|
| Evaluation & Insights Machine Learning Engineer This role focuses on evaluating and improving AI systems by analyzing AI outputs, developing evaluation frameworks, and translating findings into actionable improvements. It involves assessing model behavior, identifying edge cases, and ensuring AI systems are reliable, safe, and aligned with human expectations. The role also involves building MLOps and automation for evaluation pipelines and collaborating with various teams to refine model performance. | Eval GatePost-train | 9 |
| Staff Applied Scientist, AI Quality & Meta Evaluation Staff Applied Scientist focused on AI Quality & Meta Evaluation, responsible for designing and building the Data Quality Validation framework for LLM Judges. This role involves developing statistical and ML approaches to ensure the trustworthiness of evaluation signals, auditing LLM outputs, and establishing standards for data quality. |
| Eval GatePost-train |
| 8 |
| ML Engineer - Automated Evaluation and Adversarial Design ML Engineer focused on building and scaling automated evaluation systems and designing adversarial/stress-testing methodologies for AI-powered features in productivity and creative applications. The role involves assessing AI quality, particularly for multi-turn agentic experiences, and influencing model development decisions through rigorous evaluation. | Eval GateAgent | 8 |
| Senior Applied Scientist - AI Evaluation & Quality Systems Senior Applied Scientist focused on building and scaling AI evaluation and quality systems. The role involves developing methodologies, tooling, and autonomous QA agents to ensure the trustworthiness and quality of AI/ML systems, with a strong emphasis on human-in-the-loop evaluation and anomaly detection. Requires a blend of research and engineering skills to prototype, validate, and ship solutions. | Eval GateAgent | 8 |
| AIML - Sr Machine Learning Engineer, Responsible AI This role focuses on developing, carrying-out, interpreting, and communicating pre- and post-ship evaluations of the safety of Apple Intelligence features, leveraging both human and model-based auto-grading. It also involves researching and developing auto-grading methodology & infrastructure. The role requires creating safety evaluations that uphold Responsible AI values through data sampling, curation, annotation, auto-grading, and analysis. It draws on applied data science, scientific investigation, cross-functional communication, and metrics reporting. | Eval GatePost-train | 8 |
| ML Engineer - Evaluation Analysis, Metric and Data Strategy ML Engineer focused on defining and analyzing quality metrics for AI-powered features in consumer productivity and creative applications. This role is critical for informing model development, feature launches, and product strategy by translating evaluation data and user behavior into actionable insights. It involves designing metrics frameworks, auditing data representativeness, and developing evaluation methods for complex, agentic AI experiences. | Eval GateAgent | 7 |
| Siri, Eval Architect Engineer The role focuses on defining the architecture for systems that measure Siri's quality across platforms and model updates. It involves building evaluation infrastructure for large-scale automation, simulation, AI-powered auto-evaluators, and agentic fix pipelines. The Eval Systems Architect will own the technical vision and system architecture for Siri's evaluation stack, ensuring coherence, scalability, and trustworthiness, and will influence the technical roadmap for the evaluation platform. | Eval GateAgent | 7 |
| Test Triage & Automation Engineer, Siri This role focuses on designing, driving, and triaging automation pipelines and evaluation frameworks for Siri's AI features. The engineer will analyze large-scale test data, identify trends, and develop strategies to improve the efficiency and effectiveness of quality engineering processes. The goal is to ensure the qualitative experience of Siri's AI features meets high standards and to influence product decisions and model improvements. | Eval GateAgent | 7 |
| AIML - Machine Learning Engineer - Computer Vision & Audio, MIND Machine Learning Engineer focused on the data and evaluation lifecycle for production models in computer vision and audio. Responsibilities include scaling data pipelines, ensuring data quality, performing failure analysis, implementing data augmentation, and designing evaluation metrics for models. The role bridges hardware, software, and modeling for efficient inference. | Eval GateData | 7 |
| AIML - Software Engineer - AI, Evaluation Software Engineer role focused on building tools and systems for the automatic evaluation of Apple's AI products, specifically using LLM-as-judge and related technologies to improve the quality and efficiency of these evaluations. The role involves designing and developing frameworks, pipelines, and tools for AI model development, deployment, and measurement, directly impacting product launch decisions. | Eval GateAgent | 7 |
| Applications of ML Engineering Manager Manager for Responsible Development & Safety in Apple Services Engineering, focusing on shaping policies, evaluating AI models and applications, and ensuring safe deployment of user-facing features. The role involves leading a team, collaborating with various cross-functional teams, and developing evaluation processes for AI/ML models. | Eval GatePost-train | 7 |
| AIML - Data Scientist, Evaluation This role focuses on designing and implementing evaluation frameworks for AI/ML systems, specifically for Apple's consumer-facing products. The Data Scientist will work with large datasets, develop methodologies for assessing product quality, and partner with engineering teams to improve user experience and guide feature development. The role involves building evaluation datasets, human-in-the-loop systems, and translating insights into actionable recommendations. | Eval Gate | 7 |