On-device ML Infrastructure Engineer, ML User Experience, Apis & Integration, Graphics, Games & ML

Apple Apple · Big Tech · Cupertino, CA · Machine Learning and AI

This role focuses on building the ML infrastructure and developer experience for running ML models on Apple devices. It involves developing APIs for ML model conversion and authoring, optimizing models for efficiency and performance, and integrating ML tools into repositories. The goal is to enable efficient ingestion and implementation of models within Apple's ML stack, impacting various core experiences like Camera, Siri, and Health.

What you'd actually do

  1. Develop APIs in Apple’s ML stack for ML engineers to efficiently import and implement their models.
  2. Integrate Apple’s ML tools into internal and external model repositories to demonstrate and stress-test model ingestion with peak efficiency and performance.
  3. Develop optimizations across the pipeline, including source-level transformations, and custom operations to improve inference efficiency.
  4. Onboard the latest ML models with peak performance, and use these examples to highlight and validate the authoring and runtime capabilities of Apple’s inference stack.

Skills

Required

  • Python programming
  • C++
  • ML authoring framework (PyTorch, MLX, JAX)
  • ML fundamentals
  • common architectures (Transformers)
  • ML inference optimizations (quantization, pruning, KV caching)
  • communication skills

Nice to have

  • C++
  • Swift
  • GPU programming
  • QAT
  • compression and quantization techniques
  • deploying production-grade Python packages
  • MLIR/LLVM
  • compiler toolchains
  • Hugging Face
  • model repositories

What the JD emphasized

  • highly proficient in Python programming, familiarity with C++ is required
  • strong understanding of ML fundamentals
  • hands-on experience with ML inference optimizations
  • strong experience designing Python APIs

Other signals

  • building the first end-to-end developer experience for ML development
  • onboarding modern architectures to embedded systems
  • developing optimization toolkits for model compression and acceleration
  • building ML compilers and runtimes for efficient execution
  • creating comprehensive benchmarking and debugging toolchains
  • ML user experience APIs and integration
  • developing new ML model conversion and authoring APIs
  • integrating Apple’s ML tools/APIs into internal and external model repositories
  • ideate, design, and stress test a variety of optimizations
  • shape how ML developers experience Apple’s end-to-end inference stack