What you'd actually do

Own the end-to-end technical vision and system architecture across our entire evaluation stack, ensuring that we build toward a coherent, scalable, and trustworthy system.

Own the technical architecture of Siri's evaluation infrastructure — a system spanning real-device automation, simulated product evaluation, AI-powered auto-evaluators, developer workflows, and observability tooling.

Work across the Agentic Eval Engineering and Siri to ensure architectural coherence, define interfaces and contracts between systems, and drive the technical roadmap for the evaluation platform as a whole.

Lead a first-principles review of existing evaluation tooling and infrastructure — identifying gaps, redundancies, and opportunities to simplify or unify.

Represent the technical perspective in leadership discussions, influence build-vs-integrate decisions, and set the standards that enable teams to move fast without creating fragmentation.

Skills

Required

BS/MS/PhD in Computer Science, Software Engineering, or a related field.
10+ years of software engineering experience, with at least 5 years in a systems architecture, staff/principal engineer, or technical leadership role.
Proven track record of designing and shipping large-scale distributed systems serving multiple teams or organizations.
Deep expertise in system design: API design, service architecture, data flow modeling, interface contracts, and schema evolution.
Solid software engineering fundamentals with production experience, including CI/CD, testing strategies, system monitoring, debugging complex multi-service systems, and code maintainability.
Demonstrated expertise in using AI-assisted software development workflows to accelerate engineering while maintaining code quality.

Nice to have

Experience architecting evaluation, testing, or quality infrastructure at scale — particularly for AI/ML products where quality is non-binary and continuous.
Experience with building LLM applications, LLM-as-judge evaluation frameworks, and offline evaluation pipelines.
Familiarity with MLOps principles for model lifecycle management and training data pipelines.
Experience with VM orchestration, fleet management, or large-scale job scheduling systems.
Knowledge of simulation and service virtualization techniques for complex software stacks.
Experience with observability platforms (metrics, logging, tracing, dashboarding) and defining SLOs for platform reliability.
Experience with agentic AI systems, including tool-use, multi-step reasoning, and human-in-the-loop workflows.
Track record of leading cross-team architectural initiatives (e.g., platform migrations, API unification, system consolidation) in organizations with 50+ engineers

Do you want to define the architecture of the systems that measure Siri's quality across every platform, every locale, and every model update? Apple's Agentic Eval Engineering organization is building the evaluation infrastructure that determines how Siri's quality is measured, trusted, and improved — spanning large-scale automation on real devices, model-in-the-loop simulation, AI-powered auto-evaluators, and closed-loop agentic fix pipelines.

We are seeking a senior Eval Systems Architect to own the end-to-end technical vision and system architecture across our entire evaluation stack, ensuring that we build toward a coherent, scalable, and trustworthy system.

Description

As the Eval Systems Architect, you will own the technical architecture of Siri's evaluation infrastructure — a system spanning real-device automation, simulated product evaluation, AI-powered auto-evaluators, developer workflows, and observability tooling. You will work across the Agentic Eval Engineering and Siri to ensure architectural coherence, define interfaces and contracts between systems, and drive the technical roadmap for the evaluation platform as a whole.

This is not a role where you design in isolation. You will embed with teams, understand their systems deeply, and make architectural decisions that balance local team autonomy with system-wide consistency. You will lead a first-principles review of existing evaluation tooling and infrastructure — identifying gaps, redundancies, and opportunities to simplify or unify. You will represent the technical perspective in leadership discussions, influence build-vs-integrate decisions, and set the standards that enable teams to move fast without creating fragmentation. Your work will directly influence how Apple evaluates its most important AI products. Your architectural decisions will impact the speed, confidence, and quality with which Siri ships to billions of users.

Minimum Qualifications

BS/MS/PhD in Computer Science, Software Engineering, or a related field. 10+ years of software engineering experience, with at least 5 years in a systems architecture, staff/principal engineer, or technical leadership role. Proven track record of designing and shipping large-scale distributed systems serving multiple teams or organizations. Deep expertise in system design: API design, service architecture, data flow modeling, interface contracts, and schema evolution. Solid software engineering fundamentals with production experience, including CI/CD, testing strategies, system monitoring, debugging complex multi-service systems, and code maintainability. Demonstrated expertise in using AI-assisted software development workflows to accelerate engineering while maintaining code quality.

Preferred Qualifications

Experience architecting evaluation, testing, or quality infrastructure at scale — particularly for AI/ML products where quality is non-binary and continuous. Experience with building LLM applications, LLM-as-judge evaluation frameworks, and offline evaluation pipelines. Familiarity with MLOps principles for model lifecycle management and training data pipelines. Experience with VM orchestration, fleet management, or large-scale job scheduling systems. Knowledge of simulation and service virtualization techniques for complex software stacks. Experience with observability platforms (metrics, logging, tracing, dashboarding) and defining SLOs for platform reliability. Experience with agentic AI systems, including tool-use, multi-step reasoning, and human-in-the-loop workflows. Track record of leading cross-team architectural initiatives (e.g., platform migrations, API unification, system consolidation) in organizations with 50+ engineers

At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $257,400 and $386,300, and your base pay will depend on your skills, qualifications, experience, and location.

Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits

Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant

At Apple, we believe accessibility is a fundamental human right. You’ll find that idea reflected in everything here — in our culture, our benefits and our digital tools. By welcoming as many perspectives as possible, we help you build a career where you feel like you belong.

Learn about accessibility in Apple’s workplace

Learn about reasonable accommodations for job applicants

Apple accepts applications to this posting on an ongoing basis.

Description

Minimum Qualifications

Preferred Qualifications

Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.

Learn about accessibility in Apple’s workplace

Learn about reasonable accommodations for job applicants

Apple accepts applications to this posting on an ongoing basis.

Siri, Eval Architect Engineer

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Description

Minimum Qualifications

Preferred Qualifications

Description

Minimum Qualifications

Preferred Qualifications