Aiml - Software Engineer - Ai, Evaluation

Apple Apple · Big Tech · Cupertino, CA +1 · Machine Learning and AI

Software Engineer role focused on building tools and systems for the automatic evaluation of Apple's AI products, specifically using LLM-as-judge and related technologies to improve the quality and efficiency of these evaluations. The role involves designing and developing frameworks, pipelines, and tools for AI model development, deployment, and measurement, directly impacting product launch decisions.

What you'd actually do

  1. design and build tools and systems that sit at the intersection of AI modeling, software engineering, and product quality
  2. design and develop extensible frameworks, pipelines, and tools that enable efficient development, deployment, and qualitative measurement of AI models
  3. provide principled assessments across a diverse range of Apple features, from Search, Siri to the latest Apple Intelligence capabilities
  4. build LLM-as-judge and related tools to improve both the quality and efficiency of these evaluations
  5. expand our tools and systems

Skills

Required

  • Exceptional Python skills
  • Solid software engineering fundamentals with production experience, including system design, API design, CI/CD, testing strategies, code maintainability, system monitoring, debugging complex systems
  • Demonstrated expertise in using AI-assisted software development workflows to accelerate software development while maintaining code quality
  • Strong communication skills
  • proven ability to work collaboratively with cross-functional teams

Nice to have

  • BS/MS/PhD degree in Computer Science, Machine Learning, AI, or a related field
  • Experience with building LLM applications, frameworks, and offline evaluations
  • Familiar with MLOps principles for model lifecycle management
  • Experience in building scalable tools for product quality evaluation
  • Ability to understand and interpret evaluation reports, including metrics such as precision, recall, run-to-run consistency, and common pitfalls like data leakage
  • Product-minded, with a strong ability to translate ambiguous product requirements into solutions

What the JD emphasized

  • building LLM applications, frameworks, and offline evaluations
  • building scalable tools for product quality evaluation

Other signals

  • building LLM-as-judge and related tools
  • design and build tools and systems that sit at the intersection of AI modeling, software engineering, and product quality
  • enable efficient development, deployment, and qualitative measurement of AI models
  • influence product launch decisions
  • enable teams across Apple to iterate faster and with greater confidence