What you'd actually do

Design and execute large-scale speech data curation and processing pipelines, including collection of diverse real-world audio, synthetic data generation, and automated annotation workflows to enable high-quality model training and evaluation.

Work on pre-training and post-training of speech-language models, with targeted enhancements through supervised fine-tuning, reinforcement learning, and other techniques to ensure Grok Voice responses are accurate, factually grounded, natural and idiomatic in spoken style, conversational in tone, and fluent across multiple languages.

Build and iterate a comprehensive evaluation framework covering objective metrics (accuracy, quality, latency, expressiveness), human preference studies, content factuality assessments, real-time interaction quality, and experimentation infrastructure to measure and improve performance.

Work closely with product teams to integrate voice models into applications and real-time environments, define spoken interaction specifications, and handle the full lifecycle from prototype to global-scale deployment for stable, low-latency, delightful voice experiences.

Skills

Required

Python
Spark
Ray
JAX
PyTorch
Kubernetes
distributed training
inference systems
speech data curation
synthetic data generation
supervised fine-tuning
reinforcement learning
evaluation frameworks
human preference studies
content factuality checks
A/B testing

Nice to have

multilingual speech models
low-latency spoken interactions
natural spoken style
conversational tone

What the JD emphasized

Python expert with deep proficiency in writing clean, efficient code for AI/ML systems.

Hands-on experience processing large-scale datasets using tools like Spark and Ray for cleaning, augmentation, and feature extraction.

Proficiency in pre-training and post-training speech-language models using JAX/PyTorch, including supervised fine-tuning, reinforcement learning, and optimizations for accuracy, factuality, natural spoken style, detail, and multilingual fluency.

Ability to set up and run rigorous evaluation pipelines: objective metrics, human preference studies, content factuality checks, and iterative A/B testing to drive model improvements.

Experience building or working with large-scale distributed training and inference systems on Kubernetes.

Proactive, self-driven attitude — ready to grind in a fast-paced, high-caliber team to deliver outstanding voice AI experiences.

ABOUT xAI

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. Our team is small, highly motivated, and focused on engineering excellence. This organization is for individuals who appreciate challenging themselves and thrive on curiosity. We operate with a flat organizational structure. All employees are expected to be hands-on and to contribute directly to the company’s mission. Leadership is given to those who show initiative and consistently deliver excellence. Work ethic and strong prioritization skills are important. All employees are expected to have strong communication skills. They should be able to concisely and accurately share knowledge with their teammates.

ABOUT THE ROLE:

You will join the Grok Voice Model team to help build the world’s best voice AI. We deliver smooth, natural, low-latency spoken interactions — expressive, multilingual, and reliable across devices and real-time scenarios. We own the full training pipeline: massive data curation, premium audio processing, frontier speech-language pre-training, and intensive post-training to push quality, speed, and stability to the limit.

Our goal: make talking to AI feel like conversing with the most charming, kind, and knowledgeable person imaginable. We’re seeking exceptionally smart, execution-oriented engineers to help us get there.

RESPONSIBILITIES:

Design and execute large-scale speech data curation and processing pipelines, including collection of diverse real-world audio, synthetic data generation, and automated annotation workflows to enable high-quality model training and evaluation.
Work on pre-training and post-training of speech-language models, with targeted enhancements through supervised fine-tuning, reinforcement learning, and other techniques to ensure Grok Voice responses are accurate, factually grounded, natural and idiomatic in spoken style, conversational in tone, and fluent across multiple languages.
Build and iterate a comprehensive evaluation framework covering objective metrics (accuracy, quality, latency, expressiveness), human preference studies, content factuality assessments, real-time interaction quality, and experimentation infrastructure to measure and improve performance.
Work closely with product teams to integrate voice models into applications and real-time environments, define spoken interaction specifications, and handle the full lifecycle from prototype to global-scale deployment for stable, low-latency, delightful voice experiences.

BASIC QUALIFICATIONS:

Python expert with deep proficiency in writing clean, efficient code for AI/ML systems.
Hands-on experience processing large-scale datasets using tools like Spark and Ray for cleaning, augmentation, and feature extraction.
Proficiency in pre-training and post-training speech-language models using JAX/PyTorch, including supervised fine-tuning, reinforcement learning, and optimizations for accuracy, factuality, natural spoken style, detail, and multilingual fluency.
Ability to set up and run rigorous evaluation pipelines: objective metrics, human preference studies, content factuality checks, and iterative A/B testing to drive model improvements.
Experience building or working with large-scale distributed training and inference systems on Kubernetes.
Proactive, self-driven attitude — ready to grind in a fast-paced, high-caliber team to deliver outstanding voice AI experiences.

COMPENSATION AND BENEFITS:

$150,000 - $450,000 USD

Base salary is just one part of our total rewards package at xAI, which also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and various other discounts and perks.

_xAI is an equal opportunity employer. For details on data processing, view our _Recruitment Privacy Notice.

ABOUT xAI

ABOUT THE ROLE:

RESPONSIBILITIES:

Design and execute large-scale speech data curation and processing pipelines, including collection of diverse real-world audio, synthetic data generation, and automated annotation workflows to enable high-quality model training and evaluation.
Work on pre-training and post-training of speech-language models, with targeted enhancements through supervised fine-tuning, reinforcement learning, and other techniques to ensure Grok Voice responses are accurate, factually grounded, natural and idiomatic in spoken style, conversational in tone, and fluent across multiple languages.
Build and iterate a comprehensive evaluation framework covering objective metrics (accuracy, quality, latency, expressiveness), human preference studies, content factuality assessments, real-time interaction quality, and experimentation infrastructure to measure and improve performance.
Work closely with product teams to integrate voice models into applications and real-time environments, define spoken interaction specifications, and handle the full lifecycle from prototype to global-scale deployment for stable, low-latency, delightful voice experiences.

BASIC QUALIFICATIONS:

Python expert with deep proficiency in writing clean, efficient code for AI/ML systems.
Hands-on experience processing large-scale datasets using tools like Spark and Ray for cleaning, augmentation, and feature extraction.
Proficiency in pre-training and post-training speech-language models using JAX/PyTorch, including supervised fine-tuning, reinforcement learning, and optimizations for accuracy, factuality, natural spoken style, detail, and multilingual fluency.
Ability to set up and run rigorous evaluation pipelines: objective metrics, human preference studies, content factuality checks, and iterative A/B testing to drive model improvements.
Experience building or working with large-scale distributed training and inference systems on Kubernetes.
Proactive, self-driven attitude — ready to grind in a fast-paced, high-caliber team to deliver outstanding voice AI experiences.

COMPENSATION AND BENEFITS:

$150,000 - $450,000 USD

_xAI is an equal opportunity employer. For details on data processing, view our _Recruitment Privacy Notice.

Member of Technical Staff - Voice Model

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

ABOUT xAI

ABOUT THE ROLE:

RESPONSIBILITIES:

BASIC QUALIFICATIONS:

COMPENSATION AND BENEFITS:

ABOUT xAI

ABOUT THE ROLE:

RESPONSIBILITIES:

BASIC QUALIFICATIONS:

COMPENSATION AND BENEFITS: