What you'd actually do

Design, train, evaluate, and deploy machine learning systems that power real-time voice experiences, including ASR, speech understanding, turn detection, text to speech, speech to speech, classification, entity extraction, summarization, and structured insight generation.

Improve the quality of voice AI systems through error analysis, data curation, metric design, benchmarking, and iterative model improvement, with a strong focus on real-world performance.

Build evaluation frameworks for complex voice and agentic systems, measuring metrics such as accuracy, robustness, latency, faithfulness, naturalness, professionalism, task completion, and cost.

Diagnose and mitigate failure modes across the voice stack, including transcription errors, hallucinations, retrieval failures, tool misuse, prompt brittleness, context drift, and multi-step reasoning breakdowns.

Design and optimize low-latency ML workflows for live conversations, balancing model quality with system responsiveness, scalability, and reliability.

Skills

Required

PyTorch
TensorFlow
Hugging Face
transformer-based models
embeddings
retrieval systems
large-scale training
inference workflows
real-time ML systems
latency
scalability
reliability
data pipelines
experimentation
measurement
quality analysis
speech recognition
speech processing
NLP
generative AI
conversational AI
model evaluation
benchmarking
error analysis
quality improvement
production ML systems

Nice to have

ASR quality metrics
WER
task-level evaluation methodologies
RAG systems
agentic workflows
multi-step reasoning systems
LLM-as-a-judge evaluation methods
streaming inference
real-time voice pipelines
media systems
infrastructure teams
platform teams
production ML deployment
observability
reliability
contact center AI
conversational intelligence
enterprise voice products

What the JD emphasized

real-time production systems

model quality

production reality

rigorous evaluation frameworks

failure modes

latency and robustness

reliably at scale in real-time voice environments

error analysis

metric design

benchmarking

iterative model improvement

real-world performance

complex voice and agentic systems

accuracy, robustness, latency, faithfulness, naturalness, professionalism, task completion, and cost

transcription errors, hallucinations, retrieval failures, tool misuse, prompt brittleness, context drift, and multi-step reasoning breakdowns

low-latency ML workflows

system responsiveness, scalability, and reliability

productionize real-time inference, streaming pipelines, quality monitoring, and continuous model iteration

offline evaluation, online experimentation, model validation, observability, and ongoing quality monitoring in production

5+ years of experience building, evaluating, and deploying machine learning systems in production

Strong background in one or more of the following: speech recognition, speech processing, NLP, generative AI, or conversational AI

Deep experience with model evaluation, benchmarking, error analysis, and quality improvement for production ML systems

Solid understanding of transformer-based models, embeddings, retrieval systems, and large-scale training or inference workflows

Experience designing and deploying real-time ML systems with strong requirements around latency, scalability, and reliability

Experience building data pipelines and tooling for experimentation, measurement, and large-scale quality analysis

Ability to work across research and engineering boundaries and translate promising ideas into production-grade systems

Other signals

Develop and improve machine learning systems that power voice experiences end to end

Improve the quality of voice AI systems through error analysis, data curation, metric design, benchmarking, and iterative model improvement

Build evaluation frameworks for complex voice and agentic systems

Diagnose and mitigate failure modes across the voice stack

Design and optimize low-latency ML workflows for live conversations

Cresta is on a mission to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Our platform combines the best of AI and human intelligence to help contact centers discover customer insights and behavioral best practices, automate conversations and inefficient processes, and empower every team member to work smarter and faster. Born from the prestigious Stanford AI lab, Cresta's co-founder and chairman isSebastian Thrun, the genius behind Google X, Waymo, Udacity, and more. Our leadership also includes CEO,Ping Wu, the co-founder of Google Contact Center AI and Vertex AI platform,and co-founder, Tim Shi, an early member of Open AI.

Join us on this thrilling journey to revolutionize the workforce with AI. The future of work is here, and it's at Cresta.

About the role

We are looking for a Senior Machine Learning Engineer, Voice Experience to help build the next generation of AI-powered voice systems for the contact center. In this role, you will work at the intersection of speech, language, and real-time production systems, improving how AI listens, understands, reasons, empathizes, and responds in live customer conversations.

You will develop and improve machine learning systems that power voice experiences end to end, including automatic speech recognition, turn detection, downstream language understanding, retrieval-augmented and agentic workflows, quality measurement, text to speech, and production optimization. You will partner closely with applied researchers, product managers, designers, forward deployed engineers, and platform engineers to ensure model and system improvements translate into measurable customer and business impact.

This role is ideal for someone who is excited by both model quality and production reality: designing rigorous evaluation frameworks, analyzing failure modes, improving latency and robustness, and shipping systems that perform reliably at scale in real-time voice environments.

Responsibilities

Design, train, evaluate, and deploy machine learning systems that power real-time voice experiences, including ASR, speech understanding, turn detection, text to speech, speech to speech, classification, entity extraction, summarization, and structured insight generation.
Improve the quality of voice AI systems through error analysis, data curation, metric design, benchmarking, and iterative model improvement, with a strong focus on real-world performance.
Build evaluation frameworks for complex voice and agentic systems, measuring metrics such as accuracy, robustness, latency, faithfulness, naturalness, professionalism, task completion, and cost.
Diagnose and mitigate failure modes across the voice stack, including transcription errors, hallucinations, retrieval failures, tool misuse, prompt brittleness, context drift, and multi-step reasoning breakdowns.
Design and optimize low-latency ML workflows for live conversations, balancing model quality with system responsiveness, scalability, and reliability.
Partner with platform and backend engineers to productionize real-time inference, streaming pipelines, quality monitoring, and continuous model iteration.
Collaborate cross-functionally with product, design, frontend, and backend teams to integrate voice intelligence seamlessly into Cresta’s platform.
Establish best practices for offline evaluation, online experimentation, model validation, observability, and ongoing quality monitoring in production.
Mentor engineers, contribute to technical strategy, and help shape the roadmap for Cresta’s voice AI systems.

Qualifications We Value

Bachelor’s degree in Computer Science, Mathematics, Machine Learning, AI, or a related field; Master’s or Ph.D. preferred.
5+ years of experience building, evaluating, and deploying machine learning systems in production.
Strong background in one or more of the following: speech recognition, speech processing, NLP, generative AI, or conversational AI.
Deep experience with model evaluation, benchmarking, error analysis, and quality improvement for production ML systems.
Strong expertise with modern ML frameworks and tooling such as PyTorch, TensorFlow, and Hugging Face.
Solid understanding of transformer-based models, embeddings, retrieval systems, and large-scale training or inference workflows.
Experience designing and deploying real-time ML systems with strong requirements around latency, scalability, and reliability.
Experience building data pipelines and tooling for experimentation, measurement, and large-scale quality analysis.
Ability to work across research and engineering boundaries and translate promising ideas into production-grade systems.
Strong communication and technical leadership skills, with the ability to influence cross-functional decisions and raise the engineering bar.

Nice to Have

Hands-on experience with ASR quality metrics such as WER and task-level evaluation methodologies.
Experience with RAG systems, agentic workflows, multi-step reasoning systems, or LLM-as-a-judge evaluation methods.
Familiarity with streaming inference, real-time voice pipelines, or media systems.
Experience working closely with infrastructure or platform teams on production ML deployment, observability, and reliability.
Experience in contact center AI, conversational intelligence, or enterprise voice products. This last item is an inference from the business context of all three roles, rather than a directly stated qualification.

Perks & Benefits

We offer a comprehensive and people-first benefits package to support you at work and in life:

Comprehensive medical, dental, and vision coverage with plans to fit you and your family
Flexible PTO to take the time you need, when you need it
Paid parental leave for all new parents welcoming a new child
Retirement savings plan to help you plan for the future
Remote work setup budget to help you create a productive home office
Monthly wellness and communication stipend to keep you connected and balanced
In-office meal program and commuter benefits provided for onsite employees

**Compensation at Cresta: **

Cresta’s approach to compensation is simple: recognize impact, reward excellence, and invest in our people. We offer competitive, location-based pay that reflects the market and what each individual brings to the table.

The posted base salary range represents what we expect to pay for this role in a given location. Final offers are shaped by factors like experience, skills, education, and geography. In addition to base pay, total compensation includes equity and a comprehensive benefits package for you and your family.

Salary Range: $205,000–$270,000 + Offers Equity

This posting will be used to fill a newly-created role.

We have noticed a rise in recruiting impersonations across the industry, where scammers attempt to access candidates' personal and financial information through fake interviews and offers. All Cresta recruiting email communications will always come from the @cresta.ai domain. Any outreach claiming to be from Cresta via other sources should be ignored. If you are uncertain whether you have been contacted by an official Cresta employee, reach out to recruiting@cresta.ai