What you'd actually do

Define and lead the technical vision for Cresta’s next-generation Agentic AI systems, including Agentic Assist and enterprise AI Agents.

Architect scalable, production-grade LLM systems that integrate reasoning, retrieval, planning, tool use, and real-time decision-making into cohesive, intelligent workflows.

Design and evolve multi-agent orchestration frameworks that combine RAG, structured knowledge, domain-adapted models, and automated actions.

Establish best practices for building robust, reliable, and cost-efficient LLM-powered systems in high-scale production environments.

Own evaluation strategy for complex, non-deterministic AI systems, including offline benchmarking, online experimentation, LLM-as-a-judge methodologies, and systematic failure analysis.

Skills

Required

Bachelor’s degree in Computer Science, Mathematics, or a related field
7+ years of experience building and deploying machine learning systems in production
deep hands-on experience with LLMs at scale
Demonstrated leadership in architecting complex AI systems, particularly agentic or multi-step LLM workflows
Deep expertise in transformer-based models, embeddings, retrieval systems, and Retrieval-Augmented Generation (RAG) pipelines
Experience designing evaluation frameworks for LLM systems beyond single-turn prompts, including robustness testing and production monitoring
Strong systems thinking: ability to design for scalability, latency constraints, cost efficiency, security, and long-term maintainability
Extensive experience with modern ML frameworks (e.g., PyTorch, TensorFlow, Hugging Face) and distributed/cloud-based infrastructure
Proven ability to influence technical direction across teams as a senior individual contributor
A strong bias toward action — able to prototype rapidly while maintaining production rigor

Nice to have

Master’s or Ph.D. strongly preferred

What the JD emphasized

deep expertise in LLMs and modern prompting techniques

proven ability to translate cutting-edge research into scalable, production-grade systems

diagnosing and mitigating failure modes such as hallucinations, retrieval errors, tool misuse, context drift, prompt brittleness, and multi-step reasoning breakdowns

defining measurable quality metrics (e.g., accuracy, faithfulness, task completion, latency, and cost) for complex, non-deterministic systems

architect scalable, production-grade LLM systems

design and evolve multi-agent orchestration frameworks

establish best practices for building robust, reliable, and cost-efficient LLM-powered systems

own evaluation strategy for complex, non-deterministic AI systems

proactively identify and mitigate agent failure modes

define measurable quality standards

deep hands-on experience with LLMs at scale

demonstrated leadership in architecting complex AI systems, particularly agentic or multi-step LLM workflows

experience designing evaluation frameworks for LLM systems beyond single-turn prompts, including robustness testing and production monitoring

strong systems thinking: ability to design for scalability, latency constraints, cost efficiency, security, and long-term maintainability

Cresta is on a mission to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Our platform combines the best of AI and human intelligence to help contact centers discover customer insights and behavioral best practices, automate conversations and inefficient processes, and empower every team member to work smarter and faster. Born from the prestigious Stanford AI lab, Cresta's co-founder and chairman isSebastian Thrun, the genius behind Google X, Waymo, Udacity, and more. Our leadership also includes CEO,Ping Wu, the co-founder of Google Contact Center AI and Vertex AI platform,and co-founder, Tim Shi, an early member of Open AI.

Join us on this thrilling journey to revolutionize the workforce with AI. The future of work is here, and it's at Cresta.

About the role:

Machine Learning Engineers at Cresta work across several high-impact AI initiatives. Final team placement is determined based on experience, strengths, and business needs.

Current focus areas include:

Agentic Assist: Lead and build next-generation agentic AI systems that augment contact center agents in real time. This track requires strong pre-LLM ML foundations, deep expertise in LLMs and modern prompting techniques, a rapid prototyping mindset, and a proven ability to translate cutting-edge research into scalable, production-grade systems.
Agent & System Quality: Design evaluation frameworks and improve the reliability, robustness, and performance of LLM-powered agents. This includes diagnosing and mitigating failure modes such as hallucinations, retrieval errors, tool misuse, context drift, prompt brittleness, and multi-step reasoning breakdowns, while defining measurable quality metrics (e.g., accuracy, faithfulness, task completion, latency, and cost) for complex, non-deterministic systems.
Insights: Architect and scale LLM and retrieval-augmented generation pipelines that ground models in enterprise data. This track focuses on building high-performance ML systems that process complex data, extract structured insights, and deliver real-time, actionable intelligence at scale.

**Responsibilities: **

Define and lead the technical vision for Cresta’s next-generation Agentic AI systems, including Agentic Assist and enterprise AI Agents.
Architect scalable, production-grade LLM systems that integrate reasoning, retrieval, planning, tool use, and real-time decision-making into cohesive, intelligent workflows.
Design and evolve multi-agent orchestration frameworks that combine RAG, structured knowledge, domain-adapted models, and automated actions.
Establish best practices for building robust, reliable, and cost-efficient LLM-powered systems in high-scale production environments.
Own evaluation strategy for complex, non-deterministic AI systems, including offline benchmarking, online experimentation, LLM-as-a-judge methodologies, and systematic failure analysis.
Proactively identify and mitigate agent failure modes such as hallucinations, tool misuse, retrieval errors, prompt brittleness, context drift, and multi-step reasoning breakdowns.
Define measurable quality standards (accuracy, faithfulness, task completion, latency, cost efficiency, robustness) and drive continuous system improvement.
Influence cross-team architecture decisions across ML, backend, and product engineering to ensure seamless integration of AI capabilities.
Mentor senior engineers, raise the technical bar, and contribute to long-term AI strategy and roadmap planning.
Translate cutting-edge research advances into practical, high-impact production systems.

**Qualifications We Value: **

Bachelor’s degree in Computer Science, Mathematics, or a related field; Master’s or Ph.D. strongly preferred.
7+ years of experience building and deploying machine learning systems in production, including deep hands-on experience with LLMs at scale.
Demonstrated leadership in architecting complex AI systems, particularly agentic or multi-step LLM workflows.
Deep expertise in transformer-based models, embeddings, retrieval systems, and Retrieval-Augmented Generation (RAG) pipelines.
Experience designing evaluation frameworks for LLM systems beyond single-turn prompts, including robustness testing and production monitoring.
Strong systems thinking: ability to design for scalability, latency constraints, cost efficiency, security, and long-term maintainability.
Extensive experience with modern ML frameworks (e.g., PyTorch, TensorFlow, Hugging Face) and distributed/cloud-based infrastructure.
Proven ability to influence technical direction across teams as a senior individual contributor.
A strong bias toward action — able to prototype rapidly while maintaining production rigor.

Perks & Benefits:

We offer a comprehensive and people-first benefits package to support you at work and in life:

Comprehensive medical, dental, and vision coverage with plans to fit you and your family
Flexible PTO to take the time you need, when you need it
Paid parental leave for all new parents welcoming a new child
Retirement savings plan to help you plan for the future
Remote work setup budget to help you create a productive home office
Monthly wellness and communication stipend to keep you connected and balanced
In-office meal program and commuter benefits provided for onsite employees

**Compensation at Cresta: **

Cresta’s approach to compensation is simple: recognize impact, reward excellence, and invest in our people. We offer competitive, location-based pay that reflects the market and what each individual brings to the table.

The posted base salary range represents what we expect to pay for this role in a given location. Final offers are shaped by factors like experience, skills, education, and geography. In addition to base pay, total compensation includes equity and a comprehensive benefits package for you and your family.

OTE Range: $230,000–$300,000 + Offers Equity

Join us on this thrilling journey to revolutionize the workforce with AI. The future of work is here, and it's at Cresta.

About the role:

Machine Learning Engineers at Cresta work across several high-impact AI initiatives. Final team placement is determined based on experience, strengths, and business needs.

Current focus areas include:

Agentic Assist: Lead and build next-generation agentic AI systems that augment contact center agents in real time. This track requires strong pre-LLM ML foundations, deep expertise in LLMs and modern prompting techniques, a rapid prototyping mindset, and a proven ability to translate cutting-edge research into scalable, production-grade systems.
Agent & System Quality: Design evaluation frameworks and improve the reliability, robustness, and performance of LLM-powered agents. This includes diagnosing and mitigating failure modes such as hallucinations, retrieval errors, tool misuse, context drift, prompt brittleness, and multi-step reasoning breakdowns, while defining measurable quality metrics (e.g., accuracy, faithfulness, task completion, latency, and cost) for complex, non-deterministic systems.
Insights: Architect and scale LLM and retrieval-augmented generation pipelines that ground models in enterprise data. This track focuses on building high-performance ML systems that process complex data, extract structured insights, and deliver real-time, actionable intelligence at scale.

**Responsibilities: **

Define and lead the technical vision for Cresta’s next-generation Agentic AI systems, including Agentic Assist and enterprise AI Agents.
Architect scalable, production-grade LLM systems that integrate reasoning, retrieval, planning, tool use, and real-time decision-making into cohesive, intelligent workflows.
Design and evolve multi-agent orchestration frameworks that combine RAG, structured knowledge, domain-adapted models, and automated actions.
Establish best practices for building robust, reliable, and cost-efficient LLM-powered systems in high-scale production environments.
Own evaluation strategy for complex, non-deterministic AI systems, including offline benchmarking, online experimentation, LLM-as-a-judge methodologies, and systematic failure analysis.
Proactively identify and mitigate agent failure modes such as hallucinations, tool misuse, retrieval errors, prompt brittleness, context drift, and multi-step reasoning breakdowns.
Define measurable quality standards (accuracy, faithfulness, task completion, latency, cost efficiency, robustness) and drive continuous system improvement.
Influence cross-team architecture decisions across ML, backend, and product engineering to ensure seamless integration of AI capabilities.
Mentor senior engineers, raise the technical bar, and contribute to long-term AI strategy and roadmap planning.
Translate cutting-edge research advances into practical, high-impact production systems.

**Qualifications We Value: **

Bachelor’s degree in Computer Science, Mathematics, or a related field; Master’s or Ph.D. strongly preferred.
7+ years of experience building and deploying machine learning systems in production, including deep hands-on experience with LLMs at scale.
Demonstrated leadership in architecting complex AI systems, particularly agentic or multi-step LLM workflows.
Deep expertise in transformer-based models, embeddings, retrieval systems, and Retrieval-Augmented Generation (RAG) pipelines.
Experience designing evaluation frameworks for LLM systems beyond single-turn prompts, including robustness testing and production monitoring.
Strong systems thinking: ability to design for scalability, latency constraints, cost efficiency, security, and long-term maintainability.
Extensive experience with modern ML frameworks (e.g., PyTorch, TensorFlow, Hugging Face) and distributed/cloud-based infrastructure.
Proven ability to influence technical direction across teams as a senior individual contributor.
A strong bias toward action — able to prototype rapidly while maintaining production rigor.

Perks & Benefits:

We offer a comprehensive and people-first benefits package to support you at work and in life:

Comprehensive medical, dental, and vision coverage with plans to fit you and your family
Flexible PTO to take the time you need, when you need it
Paid parental leave for all new parents welcoming a new child
Retirement savings plan to help you plan for the future
Remote work setup budget to help you create a productive home office
Monthly wellness and communication stipend to keep you connected and balanced
In-office meal program and commuter benefits provided for onsite employees

**Compensation at Cresta: **

OTE Range: $230,000–$300,000 + Offers Equity

Staff Machine Learning Engineer

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

About the role:

About the role: