Currently tracking 194 active AI roles, up 94% versus the prior 4 weeks. Primary focus: Agent · Engineering. Salary range $120k–$487k (avg $234k).
| Title | Stage | AI score |
|---|---|---|
| Machine Learning Research Engineer, Siri Speech This role focuses on evaluating, analyzing, and improving state-of-the-art end-to-end speech models for Siri. The engineer will design and implement novel evaluation frameworks, develop tools to measure model performance, analyze model behavior, and explore innovative approaches to advance speech capabilities. The role also involves building automated processes for large-scale model evaluation and analysis, collaborating with cross-functional teams. | Eval GatePost-train | 9 |
| Machine Learning Engineer Machine Learning Engineer focused on Evaluation & Insights for the Human-Centered AI team. This role involves architecting evaluation frameworks, designing MLOps pipelines for model assessment, and translating qualitative failure modes into programmatic guardrails and training signals for Foundation Models and generative AI systems. The role also involves collaborating with various teams to ensure AI experiences are reliable, safe, and aligned with human expectations. |
| Eval GatePost-train |
| 9 |
| Evaluation & Insights Machine Learning Engineer This role focuses on evaluating and improving AI systems by analyzing AI outputs, developing evaluation frameworks, and translating findings into actionable improvements. It involves assessing model behavior, identifying edge cases, and ensuring AI systems are reliable, safe, and aligned with human expectations. The role also involves building MLOps and automation for evaluation pipelines and collaborating with various teams to refine model performance. | Eval GatePost-train | 9 |
| Applied Machine Learning Engineer - Developer Publications Applied Machine Learning Engineer focused on building and maintaining LLM evaluation pipelines for developer tools at Apple. The role emphasizes MLOps/LLMOps, assessing model quality, tracking regressions, and supporting continuous improvement cycles, requiring strong engineering fundamentals and LLM evaluation experience. | Eval GatePost-train | 8 |
| Staff Applied Scientist, AI Quality & Meta Evaluation Staff Applied Scientist focused on AI Quality & Meta Evaluation, responsible for designing and building the Data Quality Validation framework for LLM Judges. This role involves developing statistical and ML approaches to ensure the trustworthiness of evaluation signals, auditing LLM outputs, and establishing standards for data quality. | Eval GatePost-train | 8 |
| ML Engineer - Automated Evaluation and Adversarial Design ML Engineer focused on building and scaling automated evaluation systems and designing adversarial/stress-testing methodologies for AI-powered features in productivity and creative applications. The role involves assessing AI quality, particularly for multi-turn agentic experiences, and influencing model development decisions through rigorous evaluation. | Eval GateAgent | 8 |
| Senior Applied Scientist - AI Evaluation & Quality Systems Senior Applied Scientist focused on building and scaling AI evaluation and quality systems. The role involves developing methodologies, tooling, and autonomous QA agents to ensure the trustworthiness and quality of AI/ML systems, with a strong emphasis on human-in-the-loop evaluation and anomaly detection. Requires a blend of research and engineering skills to prototype, validate, and ship solutions. | Eval GateAgent | 8 |
| AIML - Sr Machine Learning Engineer, Responsible AI This role focuses on developing, carrying-out, interpreting, and communicating pre- and post-ship evaluations of the safety of Apple Intelligence features, leveraging both human and model-based auto-grading. It also involves researching and developing auto-grading methodology & infrastructure. The role requires creating safety evaluations that uphold Responsible AI values through data sampling, curation, annotation, auto-grading, and analysis. It draws on applied data science, scientific investigation, cross-functional communication, and metrics reporting. | Eval GatePost-train | 8 |
| AI Data Scientist This role focuses on evaluating, optimizing, and analyzing the performance of ML and multi-modal LLMs. The Data Scientist will develop metrics, conduct failure analysis, process data for evaluation, and implement optimization techniques. They will collaborate with cross-functional teams to integrate models and communicate results. The role requires experience with model evaluation, RAG, and LLM prompt evaluation, with preferred experience in multi-modal foundation models and GenAI frameworks. | Eval GatePost-train | 8 |
| ML Engineer - Evaluation Analysis, Metric and Data Strategy ML Engineer focused on defining and analyzing quality metrics for AI-powered features in consumer productivity and creative applications. This role is critical for informing model development, feature launches, and product strategy by translating evaluation data and user behavior into actionable insights. It involves designing metrics frameworks, auditing data representativeness, and developing evaluation methods for complex, agentic AI experiences. | Eval GateAgent | 7 |
| Siri, Eval Architect Engineer The role focuses on defining the architecture for systems that measure Siri's quality across platforms and model updates. It involves building evaluation infrastructure for large-scale automation, simulation, AI-powered auto-evaluators, and agentic fix pipelines. The Eval Systems Architect will own the technical vision and system architecture for Siri's evaluation stack, ensuring coherence, scalability, and trustworthiness, and will influence the technical roadmap for the evaluation platform. | Eval GateAgent | 7 |
| Test Triage & Automation Engineer, Siri This role focuses on designing, driving, and triaging automation pipelines and evaluation frameworks for Siri's AI features. The engineer will analyze large-scale test data, identify trends, and develop strategies to improve the efficiency and effectiveness of quality engineering processes. The goal is to ensure the qualitative experience of Siri's AI features meets high standards and to influence product decisions and model improvements. | Eval GateAgent | 7 |
| Quality Engineer - Machine Learning Quality Engineer for Machine Learning in Apple's Creative Music Apps team, focusing on testing ML models and DSP algorithms for audio features on macOS, iOS & iPadOS. Responsibilities include stress-testing for regressions, designing test strategies, developing automated tests, and collaborating with ML engineers on quality metrics. | Eval GatePost-train | 7 |
| AIML - Machine Learning Engineer - Computer Vision & Audio, MIND Machine Learning Engineer focused on the data and evaluation lifecycle for production models in computer vision and audio. Responsibilities include scaling data pipelines, ensuring data quality, performing failure analysis, implementing data augmentation, and designing evaluation metrics for models. The role bridges hardware, software, and modeling for efficient inference. | Eval GateData | 7 |
| AIML - Software Engineer - AI, Evaluation Software Engineer role focused on building tools and systems for the automatic evaluation of Apple's AI products, specifically using LLM-as-judge and related technologies to improve the quality and efficiency of these evaluations. The role involves designing and developing frameworks, pipelines, and tools for AI model development, deployment, and measurement, directly impacting product launch decisions. | Eval GateAgent | 7 |
| Applications of ML Engineering Manager Manager for Responsible Development & Safety in Apple Services Engineering, focusing on shaping policies, evaluating AI models and applications, and ensuring safe deployment of user-facing features. The role involves leading a team, collaborating with various cross-functional teams, and developing evaluation processes for AI/ML models. | Eval GatePost-train | 7 |
| AIML - Data Scientist, Evaluation This role focuses on designing and implementing evaluation frameworks for AI/ML systems, specifically for Apple's consumer-facing products. The Data Scientist will work with large datasets, develop methodologies for assessing product quality, and partner with engineering teams to improve user experience and guide feature development. The role involves building evaluation datasets, human-in-the-loop systems, and translating insights into actionable recommendations. | Eval Gate | 7 |
| Software Development Engineer - Test, Graphics, Games & ML Software Development Engineer - Test role focused on ensuring the quality of on-device machine learning technologies at Apple. The role involves developing infrastructure, automation, and services for validation and qualification, maintaining CI/CD pipelines, and collaborating with various teams across hardware, software, and product development. Experience with ML frameworks is preferred. | Eval Gate | 5 |
| AIML - Sr Data Scientist, Evaluation This role focuses on developing and implementing evaluation methods for Siri's user-facing products, using data science and machine learning to guide product development and improve search quality. The primary focus is on evaluation and measurement, with collaboration on core ML algorithms. | Eval Gate | 5 |