Member of Technical Staff - Voice Model

xAI xAI · AI Frontier · Palo Alto, CA · Model

The role focuses on building and improving voice AI models for natural, low-latency spoken interactions. This involves large-scale data curation, speech-language model pre-training and post-training, and developing a comprehensive evaluation framework. The goal is to integrate these models into real-time applications for a global scale deployment.

What you'd actually do

  1. Design and execute large-scale speech data curation and processing pipelines, including collection of diverse real-world audio, synthetic data generation, and automated annotation workflows to enable high-quality model training and evaluation.
  2. Work on pre-training and post-training of speech-language models, with targeted enhancements through supervised fine-tuning, reinforcement learning, and other techniques to ensure Grok Voice responses are accurate, factually grounded, natural and idiomatic in spoken style, conversational in tone, and fluent across multiple languages.
  3. Build and iterate a comprehensive evaluation framework covering objective metrics (accuracy, quality, latency, expressiveness), human preference studies, content factuality assessments, real-time interaction quality, and experimentation infrastructure to measure and improve performance.
  4. Work closely with product teams to integrate voice models into applications and real-time environments, define spoken interaction specifications, and handle the full lifecycle from prototype to global-scale deployment for stable, low-latency, delightful voice experiences.

Skills

Required

  • Python
  • Spark
  • Ray
  • JAX
  • PyTorch
  • Kubernetes
  • distributed training
  • inference systems
  • speech data curation
  • synthetic data generation
  • supervised fine-tuning
  • reinforcement learning
  • evaluation frameworks
  • human preference studies
  • content factuality checks
  • A/B testing

Nice to have

  • multilingual speech models
  • low-latency spoken interactions
  • natural spoken style
  • conversational tone

What the JD emphasized

  • Python expert with deep proficiency in writing clean, efficient code for AI/ML systems.
  • Hands-on experience processing large-scale datasets using tools like Spark and Ray for cleaning, augmentation, and feature extraction.
  • Proficiency in pre-training and post-training speech-language models using JAX/PyTorch, including supervised fine-tuning, reinforcement learning, and optimizations for accuracy, factuality, natural spoken style, detail, and multilingual fluency.
  • Ability to set up and run rigorous evaluation pipelines: objective metrics, human preference studies, content factuality checks, and iterative A/B testing to drive model improvements.
  • Experience building or working with large-scale distributed training and inference systems on Kubernetes.
  • Proactive, self-driven attitude — ready to grind in a fast-paced, high-caliber team to deliver outstanding voice AI experiences.

Other signals

  • large-scale speech data curation
  • frontier speech-language pre-training
  • intensive post-training
  • low-latency spoken interactions
  • multilingual