Imagine what you could do here. At Apple, great ideas have a way of becoming phenomenal products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. Do you love solving complex distributed systems challenges at massive scale? Are you passionate about Kubernetes scheduling, resource management, and building platforms that power the next generation of Machine Learning and Data workloads? Do you thrive in designing and operating highly reliable, large-scale job scheduling and orchestration systems that serve as the backbone of AI and Data infrastructure? If so, join the Apple Data Platform team to design and build a scalable batch and ML infrastructure platform used across Apple. As part of Apple Data Platform, you will play a meaningful role in designing, developing, and deploying high-performance systems that power batch and ML workloads across Apple's global infrastructure spanning public clouds and Apple data centers. This enormous scale brings unique and complex challenges in resource scheduling, workload orchestration, and operational excellence that require extraordinarily creative problem-solving.

Description

Apple Batch is a fully managed platform within the Apple Data Platform that supports large-scale batch and ML workloads across Apple data centers and AWS/GCP. It orchestrates containerized workloads such as Spark, Ray, and LLM batch inference using YuniKorn/Kueue for advanced multi-cluster scheduling. The platform delivers org/team quota management, automatic node repair, end-to-end observability, strong security, and granular cost reporting. As part of the Apple Batch team, you will have a meaningful role in designing, developing, and deploying high-performance systems that power large-scale batch processing and ML workloads daily. We are building critical infrastructure that provides scalable batch execution, intelligent Kubernetes-native job scheduling, multi-tenant resource management, and efficient workload orchestration for ML training, inference, and data processing workloads across multi-cloud and on-premises environments. We are looking for a strong, enthusiastic engineer with deep expertise in Kubernetes scheduling and distributed systems. You will have significant individual responsibility and influence over critical platform services. You are someone with ideas and a real passion for building infrastructure that improves reliability, efficiency, and simplicity at Apple scale.

Responsibilities

Design, build, and deploy highly reliable, large-scale distributed systems for batch processing and ML infrastructure across public clouds and Apple data centers using Go, Java, or Python Architect and operate Kubernetes-native scheduling systems such as Kueue and YuniKorn, building custom operators and CRDs to manage complex ML and data workloads Implement advanced scheduling strategies including gang scheduling, topology-aware routing, bin-packing, and fair-share queuing to maximize GPU efficiency and hardware utilization Build and manage secure, multi-tenant Kubernetes environments with strict resource isolation, quota governance, and priority-based preemption Drive end-to-end observability, monitoring, and incident response practices to ensure high availability and fault tolerance of production systems Collaborate with ML researchers, data engineers, SRE, and product teams to integrate scheduling solutions into Apple's broader AI and data platform ecosystem Contribute to platform adoption by guiding internal customers, gathering requirements, and delivering impactful platform capabilities

Minimum Qualifications

5+ years of experience designing, developing, and operating highly available, large-scale distributed systems and data or ML infrastructure Strong software engineering skills with deep programming expertise in Go, Java, or Python Advanced knowledge of Kubernetes internals including custom controllers, scheduler architecture, resource quotas, and workload lifecycle management Hands-on experience with Kubernetes-native batch scheduling frameworks such as Kueue or YuniKorn and advanced scheduling concepts like gang scheduling, bin-packing, and priority preemption Experience with cloud-native infrastructure across multi-cloud environments including AWS, GCP, and on-premises systems Strong commitment to operational excellence, system observability, and continuous improvement for mission-critical services

Preferred Qualifications

GPU scheduling, accelerator-aware placement, and optimization for large-scale AI/ML workloads Experience with distributed data and ML frameworks such as Apache Spark, Ray, PyTorch, JAX, or Flink at scale Experience contributing to open-source projects in Kubernetes scheduling, container technologies, or ML infrastructure ecosystems such as Apache YuniKorn, Kueue, or similar systems Experience using GenAI technologies to improve developer productivity, streamline engineering processes, and accelerate team execution

At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $171,600 and $302,200, and your base pay will depend on your skills, qualifications, experience, and location.

Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits

Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant

At Apple, we believe accessibility is a fundamental human right. You’ll find that idea reflected in everything here — in our culture, our benefits and our digital tools. By welcoming as many perspectives as possible, we help you build a career where you feel like you belong.

Learn about accessibility in Apple’s workplace

Learn about reasonable accommodations for job applicants

Apple accepts applications to this posting on an ongoing basis.

Description

Responsibilities

Minimum Qualifications

Preferred Qualifications

Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.

Learn about accessibility in Apple’s workplace

Learn about reasonable accommodations for job applicants

Apple accepts applications to this posting on an ongoing basis.

Senior ML Engineer

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Description

Responsibilities

Minimum Qualifications

Preferred Qualifications

Description

Responsibilities

Minimum Qualifications

Preferred Qualifications