Machine Learning Architect - Conversational Speech

Apple Apple · Big Tech · Cupertino, CA · Machine Learning and AI

Machine Learning Architect for Conversational Speech at Apple, responsible for defining modeling strategy and technical direction for speech recognition, synthesis, dialog systems, multimodal foundation models, and speech-to-speech technologies. The role involves hands-on technical leadership, translating research into production-quality systems at scale, and ensuring architectural decisions align with on-device constraints, latency, and scalability.

What you'd actually do

  1. Set the future modeling direction for all of conversational speech—charting the architectural and algorithmic course for how Apple's speech technologies evolve.
  2. Operate as a hands-on expert who not only defines strategy but also digs into the hardest technical problems, working shoulder-to-shoulder with teams to overcome critical obstacles.
  3. Define modeling strategy and technical direction across the Speech organization, establishing a unified architectural vision for speech recognition, speech synthesis, dialog systems, multimodal foundation models, and speech-to-speech technologies.
  4. Serve as the organization's foremost modeling expert, providing deep technical guidance to multiple teams working on interconnected speech capabilities.
  5. Evaluate emerging research and industry trends—including advances in large language models, multimodal architectures, and full-duplex natural conversational systems—and translate them into actionable roadmaps.

Skills

Required

  • 10+ years of experience in machine learning applied to speech or multimodal systems
  • Demonstrated expertise as a technical leader or architect who has defined modeling direction across multiple teams or product areas
  • Deep, hands-on proficiency in modern deep learning, including large language models and end-to-end speech systems
  • Significant experience with multimodal LLMs, including architecture design, training, adaptation, and deployment of models that integrate speech, audio, and text modalities
  • Direct experience building speech-to-speech conversational systems, with a strong understanding of full-duplex natural conversational interaction and end-to-end speech pipelines
  • A track record of translating research into production-quality systems at scale
  • Expert programming skills in Python and deep learning frameworks such as PyTorch, JAX, or TensorFlow

Nice to have

  • Ph.D. in Computer Science, Electrical Engineering, Machine Learning, or similar technical field
  • Experience architecting or leading development of full-duplex natural conversational systems, speech-to-speech models, or multimodal foundation models that have shipped to large-scale user populations
  • Deep familiarity with the full stack of speech technologies—ASR, TTS, spoken dialog, speaker modeling, audio understanding—and an ability to reason about their interactions and dependencies
  • Experience with large-scale distributed training and the infrastructure considerations that shape model design at scale
  • A data-centric perspective on foundation model development, including experience guiding data collection, curation, annotation, and quality strategies
  • Experience with on-device ML deployment, including model compression, quantization, and latency-aware architecture design

What the JD emphasized

  • production-readiness
  • on-device constraints
  • latency
  • scalability
  • robustness
  • full-duplex natural conversational systems
  • multimodal foundation models
  • speech-to-speech conversational systems

Other signals

  • Define modeling strategy and technical direction across the Speech organization
  • Establish a unified architectural vision for speech recognition, speech synthesis, dialog systems, multimodal foundation models, and speech-to-speech technologies
  • Champion production-readiness, ensuring architectural decisions account for on-device constraints, latency, scalability, and robustness
  • Collaborate broadly with partner teams across Siri, Apple Intelligence, hardware, and platform engineering