Aiml - Sr Data Scientist, Evaluation

Apple Apple · Big Tech · Seattle, WA +2 · Machine Learning and AI

This role focuses on developing and implementing evaluation methods for AI/ML products, particularly for search quality and user-facing features like Siri and Apple Intelligence. It involves working with large datasets, applying advanced analytical methods including prompt engineering and using LLMs as judges, and partnering with engineering teams to translate methodological developments into production technologies. The role requires strong data science, ML, and analytics skills, with a focus on experimentation and evaluation.

What you'd actually do

  1. Research and develop evaluation methods to improve the quality of Apple user facing products, such as Siri and Apple Intelligence.
  2. Work with evaluation/experimentation engineering teams to get your methodological developments translated into technologies that product engineering will use every day.
  3. Work with large, complex data sets.
  4. Solve difficult, non-routine analysis problems, applying advanced analytical methods as needed, including prompt engineering and building llm as judges.
  5. Conduct analysis that includes data collection and quality control, requirements specification, processing and presentations.

Skills

Required

  • 5 years of relevant work experience
  • Advanced degree in a quantitative field such as Statistics, Operational Research, Bioinformatics, Economics, Psychology, Computer Science, Sociology, Mathematics, Physics, or similar quantitative field.
  • Proficiency in data science, machine learning, and analytics, including statistical data analysis.
  • Experience crafting, conducting, analyzing, and interpreting experiments and investigations, especially on data quality, evaluation and risk assessment.
  • Strong programming skills, including data-querying skills (SQL and/or Spark, etc.)
  • Experience with a scripting language for data processing and development (e.g., Python, R, or Scala).
  • Experience articulating and translating business questions and using statistical techniques to arrive at an answer using available data.
  • Strong communication skills and the ability to naturally explain difficult technical topics (especially causal topics) to everyone from data scientists to engineers to business partners.

Nice to have

  • Proven ability to collaborate effectively across functions and work well within a team.
  • Capable of driving projects of varying sizes and scopes
  • Worked with methods that address accuracy and variability in human annotation data.
  • Ability to learn new technology and skills to accommodate changing working requirements.

What the JD emphasized

  • rigorous use of data
  • advanced analytical methods
  • data collection and quality control
  • difficult, non-routine analysis problems

Other signals

  • driving product impact via measurement and evaluation
  • guide product development, decisions and directions through principled evaluation and rigorous use of data
  • improve search quality and guide feature development with data
  • Research and develop evaluation methods to improve the quality of Apple user facing products, such as Siri and Apple Intelligence
  • building llm as judges