Sr. Software Engineer - Data, Siri Speech

Apple Apple · Big Tech · Cupertino, CA · Machine Learning and AI

This role focuses on building and automating backend tools for Siri's Speech data warehouses, including cataloging, annotation, and making the data queryable via LLM-based chatbots. It involves distributed data engineering at the intersection of speech recognition, NLP, and dialogue management, with the goal of improving training and evaluation of Siri's models.

What you'd actually do

  1. Implement backend tools for Speech data warehouses including cataloging the entire collection of Speech Data
  2. Automate speech data annotation that runs on a self-serve platform
  3. Deploy and implement LLM-based chatbots to make the unified speech warehouse queryable and actionable (such as derived dataset creation) via natural language
  4. Automate onboarding of new speech datasets from various sources onto a unified speech warehouse for easier discoverability and inclusion in training and evaluation of Siri
  5. Collaborate with other Data and infrastructure teams across Apple to implement querying and speech dataset creation improvements

Skills

Required

  • Python software development
  • CI/CD
  • unit and integration testing
  • Distributed data processing tools and frameworks (Beam, Spark, Dask, Ray)
  • Strong software engineering abilities in Python

Nice to have

  • M.S. or Ph.D. degree in Computer Science, or equivalent experience
  • Speech and/or Machine Learning experience
  • Real passion for building research demo data solution prototypes and turning them into production quality design/implementation
  • Strong interpersonal skills to work well with engineering teams
  • Excellent problem solving and critical thinking
  • Ability to work in a fast-paced environment with rapidly changing priorities
  • Passionate about building extraordinary products and experiences for our users

What the JD emphasized

  • Deep expertise in Python software development
  • Distributed data processing tools and frameworks (Beam, Spark, Dask, Ray)
  • Strong software engineering abilities in Python
  • Strong data engineering background in speech and/or language/text/dialogue processing field

Other signals

  • training data
  • distributed training
  • speech recognition
  • natural language processing
  • dialogue management
  • LLM-based chatbots