Aiml - Sr Machine Learning Engineer, Responsible AI

Apple Apple · Big Tech · Cupertino, CA +1 · Machine Learning and AI

This role focuses on developing, carrying-out, interpreting, and communicating pre- and post-ship evaluations of the safety of Apple Intelligence features, leveraging both human and model-based auto-grading. It also involves researching and developing auto-grading methodology & infrastructure. The role requires creating safety evaluations that uphold Responsible AI values through data sampling, curation, annotation, auto-grading, and analysis. It draws on applied data science, scientific investigation, cross-functional communication, and metrics reporting.

What you'd actually do

  1. Develop metrics for evaluation of safety and fairness risks inherent to generative AI features.
  2. Design datasets, identify data needs, and work on creative solutions, scaling and expanding data coverage through human and synthetic generation methods.
  3. Develop auto-grading technologies and approaches for application in safety evaluations of generative AI features.
  4. Provide technical direction and expertise to team-wide initiatives in safety auto-grading.
  5. Use and implement data pipelines, and collaborate cross-functionally to execute end-to-end safety evaluations.

Skills

Required

  • MS, or PhD in Computer Science, Machine Learning, Statistics, or related fields; or an equivalent qualification acquired through other avenues.
  • Experience working with generative models for evaluation and/or product development, and up-to-date knowledge of common challenges and failures.
  • Strong engineering skills and experience in writing production-quality code in Python.
  • Deep experience in foundation model-based AI programming (i.e.: using DSPy for optimizing foundation model prompts, for example) and a drive to innovate in this space.
  • Experience working with noisy, crowd-based data labels and human evaluations.

Nice to have

  • Experience working in the Responsible AI space.
  • Prior scientific research and publication experience.
  • Strong organizational and operational skills working with large, multi-functional, and diverse teams.
  • Curiosity about fairness and bias in generative AI systems, and a strong desire to help make the technology more equitable.

What the JD emphasized

  • Responsible AI
  • safety
  • fairness
  • robustness
  • explainability
  • uncertainty
  • evaluations
  • auto-grading
  • human grading
  • synthetic generation
  • production-quality code

Other signals

  • Responsible AI
  • safety
  • fairness
  • robustness
  • explainability
  • uncertainty
  • evaluations
  • auto-grading
  • human grading
  • synthetic generation