Currently tracking 171 active AI roles, down 37% versus the prior 4 weeks. Primary focus: Agent · Engineering. Salary range $120k–$487k (avg $235k).
Apple has 261 active AI-related job listings. The majority of these roles are focused on agents, accounting for 24% of the total, followed by application (22%) and serving infrastructure (21%). Engineering is the primary function for these positions, with the United States being the dominant hiring country. Frequent tech tags include model serving, inference infrastructure, and LLM observability. Over the last 30 days, Apple has posted 111 new AI roles, representing a 61% increase compared to the previous 30-day period.
Apple currently has 233 active AI-related roles in our index. The most common open titles are: Machine Learning Engineer (4), AIML - Sr Data Scientist, Evaluation (2), Advanced Manufacturing Engineer(iPhone) - Smart Manufacturing (2), Machine Learning Engineer, Apple Services Engineering (2), Machine Learning Software Engineer (2). Most positions are in Engineering and Research.
Apple's active AI hiring is concentrated in: agents (30%), application (21%), serving infrastructure (14%). These categories follow a seven-stage AI lifecycle: data, pre-training, post-training, serving infrastructure, agents, evaluation, and application.
Apple is hiring AI talent in: United States (182 roles), China (17 roles), India (10 roles), United Kingdom (7 roles).
Job postings at Apple most frequently mention: Machine Learning, Python, Data Science, Large Language Models (LLMs), Statistics.
In the past 30 days, Apple has posted 80 new AI-related roles.
| Title | Stage | AI score |
|---|---|---|
| Evaluation & Insights Machine Learning Engineer This role focuses on evaluating and improving AI systems by analyzing AI outputs, developing evaluation frameworks, and translating findings into actionable improvements. It involves assessing model behavior, identifying edge cases, and ensuring AI systems are reliable, safe, and aligned with human expectations. The role also involves building MLOps and automation for evaluation pipelines and collaborating with various teams to refine model performance. | Eval GatePost-train | 9 |
| Machine Learning Engineer, ML/GenAI Evaluation Machine Learning Engineer focused on evaluating ML and GenAI models for Wallet, Payments, and Commerce features. This role defines evaluation criteria, metrics frameworks, and quality standards, designs adversarial test strategies, and owns the model quality sign-off process to ensure models meet high standards for accuracy, robustness, fairness, and reliability before shipping to hundreds of millions of users. Responsibilities include building test sets, developing robustness testing methodologies, owning fairness evaluation end-to-end, evaluating generative model outputs, and synthesizing results for product decisions. |
| Eval Gate |
| 8 |
| Staff Applied Scientist, AI Quality & Meta Evaluation Staff Applied Scientist focused on AI Quality & Meta Evaluation, responsible for designing and building the Data Quality Validation framework for LLM Judges. This role involves developing statistical and ML approaches to ensure the trustworthiness of evaluation signals, auditing LLM outputs, and establishing standards for data quality. | Eval GatePost-train | 8 |
| ML Engineer - Automated Evaluation and Adversarial Design ML Engineer focused on building and scaling automated evaluation systems and designing adversarial/stress-testing methodologies for AI-powered features in productivity and creative applications. The role involves assessing AI quality, particularly for multi-turn agentic experiences, and influencing model development decisions through rigorous evaluation. | Eval GateAgent | 8 |
| Senior Applied Scientist - AI Evaluation & Quality Systems Senior Applied Scientist focused on building and scaling AI evaluation and quality systems. The role involves developing methodologies, tooling, and autonomous QA agents to ensure the trustworthiness and quality of AI/ML systems, with a strong emphasis on human-in-the-loop evaluation and anomaly detection. Requires a blend of research and engineering skills to prototype, validate, and ship solutions. | Eval GateAgent | 8 |
| AIML - Sr Machine Learning Engineer, Responsible AI This role focuses on developing, carrying-out, interpreting, and communicating pre- and post-ship evaluations of the safety of Apple Intelligence features, leveraging both human and model-based auto-grading. It also involves researching and developing auto-grading methodology & infrastructure. The role requires creating safety evaluations that uphold Responsible AI values through data sampling, curation, annotation, auto-grading, and analysis. It draws on applied data science, scientific investigation, cross-functional communication, and metrics reporting. | Eval GatePost-train | 8 |
| AIML - Sr Data Scientist, Evaluation This role focuses on developing and researching evaluation methods to improve the quality of user-facing AI products like Siri and Apple Intelligence. It involves working with large datasets, applying advanced analytical methods including prompt engineering and using LLMs as judges, and partnering with engineering teams to translate methodological developments into production technologies. The goal is to guide product development and decisions through rigorous evaluation and data analysis, ultimately impacting products used by hundreds of millions globally. | Eval GatePost-train | 7 |
| AIML - Sr Data Scientist, Evaluation This role focuses on developing and implementing evaluation methods for AI/ML products, particularly for search quality and user-facing features like Siri and Apple Intelligence. It involves working with large datasets, applying advanced analytical methods including prompt engineering and using LLMs as judges, and partnering with engineering teams to translate methodological developments into production technologies. The role requires strong data science, ML, and analytics skills, with a focus on experimentation and evaluation. | Eval GatePost-train | 7 |
| Data Scientist, AI/ML Model Quality This role focuses on ensuring the quality of data used for training and evaluating AI/ML models, particularly in Generative AI systems within the Wallet, Payments, and Commerce domains. The Data Scientist will build and maintain intelligent systems, validation frameworks, and monitoring pipelines to ensure data integrity and model trustworthiness. Responsibilities include curating ground-truth datasets, auditing training data for bias, defining data quality metrics, integrating automated checks, and analyzing telemetry for GenAI workflows to identify failure modes and provide recommendations. | Eval GateData | 7 |
| Systems Engineer - Evaluation Engineering Systems Engineer focused on building and scaling the infrastructure for an AI Agentic Evaluation Platform. This involves designing distributed execution engines, internal developer platforms, backend APIs, stream processing, and deployment topologies for large-scale agent simulations and LLM-as-a-judge pipelines. The role emphasizes reliability, observability, and guardrails for complex AI systems. | Eval GateAgent | 7 |
| Sr. Software Engineer: Agentic Evaluation This role focuses on building and maintaining the infrastructure, tooling, and pipelines for evaluating Siri, Apple's AI assistant, at scale. The engineer will extend evaluation capabilities to new platforms, support new features, diagnose failures, and contribute to architecture decisions for evaluation systems. Experience with evaluating ML, LLM, or agent-based systems is preferred. | Eval GateAgent | 7 |
| Automation and Triage Engineer, Siri This role focuses on building and maintaining automated test suites and evaluation frameworks for Siri, ensuring its AI quality and performance across various Apple platforms. It involves investigating complex failures in Siri's AI pipeline, distinguishing regressions, and partnering with engineering and ML teams to define and track quality metrics. The role requires strong software engineering skills, experience with agentic systems and LLM evaluation, and familiarity with on-device AI and conversational systems. | Eval GateAgent | 7 |
| Annotation Data Scientist, Evaluation Integrity (Siri) This role focuses on designing and managing human-in-the-loop (HITL) annotation tasks to evaluate agentic systems, specifically for Siri. The primary goal is to create a trusted quality signal by turning human judgment into a rigorous, reproducible metric. Responsibilities include designing annotation tasks, authoring guidelines, managing annotation programs, developing custom tooling, applying data science to analyze human-labeled data, and contributing to overall evaluation health reporting. The role sits at the intersection of data science, human annotation engineering, and evaluation methodology. | Eval GateAgent | 7 |
| ML Engineer - Evaluation Analysis, Metric and Data Strategy ML Engineer focused on defining and analyzing quality metrics for AI-powered features in consumer productivity and creative applications. This role is critical for informing model development, feature launches, and product strategy by translating evaluation data and user behavior into actionable insights. It involves designing metrics frameworks, auditing data representativeness, and developing evaluation methods for complex, agentic AI experiences. | Eval GateAgent | 7 |
| Siri, Eval Architect Engineer The role focuses on defining the architecture for systems that measure Siri's quality across platforms and model updates. It involves building evaluation infrastructure for large-scale automation, simulation, AI-powered auto-evaluators, and agentic fix pipelines. The Eval Systems Architect will own the technical vision and system architecture for Siri's evaluation stack, ensuring coherence, scalability, and trustworthiness, and will influence the technical roadmap for the evaluation platform. | Eval GateAgent | 7 |
| AIML - Machine Learning Engineer - Computer Vision & Audio, MIND Machine Learning Engineer focused on the data and evaluation lifecycle for production models in computer vision and audio. Responsibilities include scaling data pipelines, ensuring data quality, performing failure analysis, implementing data augmentation, and designing evaluation metrics for models. The role bridges hardware, software, and modeling for efficient inference. | Eval GateData | 7 |
| AIML - Software Engineer - AI, Evaluation Software Engineer role focused on building tools and systems for the automatic evaluation of Apple's AI products, specifically using LLM-as-judge and related technologies to improve the quality and efficiency of these evaluations. The role involves designing and developing frameworks, pipelines, and tools for AI model development, deployment, and measurement, directly impacting product launch decisions. | Eval GateAgent | 7 |
| Data Scientist, Maps Evaluation Data Scientist focused on the deep evaluation of Apple Maps search services, features, monetization initiatives, and Apple Business. This role involves defining success metrics, evaluating product performance, understanding user behavior, and driving data-informed decisions through experiment design, A/B testing, funnel analysis, exploratory data analysis, AI/ML modeling, and data mining. | Eval Gate | 5 |
| Evaluation Reliability SRE This role focuses on the reliability and operational excellence of ML evaluation infrastructure, specifically the production backbone for Siri's quality signal. It involves managing resources, orchestration, on-call response, and observability systems to ensure the trustworthiness of evaluation infrastructure. The role requires hands-on experience in site reliability, infrastructure engineering, and operating production systems, with a focus on proactive reliability work and incident response. | Eval Gate | 5 |