Job Description

Join us at Zendesk, where we're on a mission to power exceptional service for every person on the planet. We're accelerating that ambition by building products rooted in AI, automation, and intelligent customer experiences, because behind every interaction lies an opportunity to make a human connection.

We’re looking for a Staff Machine Learning Engineer to shape the architecture and strategy of Zendesk’s GenAI platform. You’ll lead large, cross‑functional efforts that standardize evaluation, access, observability, and orchestration for LLMs across product lines—making AI experiences safe, performant, and trustworthy for millions of end users.

Summary

As a Staff ML Engineer, you’ll define and deliver platform-level solutions (LLM Proxy, benchmarking, cost-control, orchestration) that scale across Zendesk. You’ll influence product and ML roadmaps, lead complex technical projects, and mentor engineering leaders to raise the bar on AI quality and trust company‑wide.

**What you’ll be doing **

Architect and lead delivery of cross‑product GenAI platform capabilities: LLM Proxy, model registry integrations, vendor abstraction, and cost/usage attribution.
Own the design and scaling of evaluation and benchmarking frameworks (A/B, offline, continuous regression tests) used to gate model releases.
Define company‑wide standards for safety, tone, and reasoning evaluation; drive adoption of evaluation rubrics and automated checks.
Identify systemic failure modes across products and model families; prioritize mitigations, monitoring, and retraining strategies in partnership with ML teams.
Drive platform reliability, observability, and capacity planning for LLM services; implement rate limiting, throttling, and SLA practices.
Lead efforts to enable agentic workflows and safe tool use, defining integration patterns and security boundaries.
Partner with engineering leadership, product, research, and legal/policy teams to translate risk, cost, and quality tradeoffs into platform design decisions.
Mentor senior engineers, coordinate cross‑team roadmaps, and represent the platform in technical forums.

What you bring to the role

8+ years building distributed systems and ML infrastructure with a track record delivering large, cross‑team projects to production.
Deep understanding of LLMs, inference serving patterns, vendor routing strategies, and platform design for ML workloads.
Strong system design skills: scalable architectures, service reliability engineering, capacity planning, and cost optimization.
Proficiency in Python (or comparable server-side language), Kubernetes, cloud infrastructure, and observability tooling.
Experience creating evaluation frameworks, gold‑standard datasets, and regression suites for language models.
Excellent stakeholder skills: you can synthesize product, research, and engineering constraints into pragmatic platform solutions.
Proven ability to lead technical strategy and mentor senior engineers to achieve broad adoption.

**Basic Qualifications **

BS in Computer Science, Engineering, or related field, or equivalent practical experience.
8+ years industry experience in backend, platform, or ML infrastructure engineering with major production responsibilities.
Demonstrable experience with cloud-native infrastructure (Kubernetes, AWS/GCP/Azure) and production ML/LLM systems.
Strong track record of building evaluation and monitoring for ML systems.

**Preferred Qualifications **

Experience building model registries, feature stores, or inference platforms at scale.
Background in agentic AI frameworks, workflow orchestration, or tool-using models.
Prior experience influencing company-wide ML safety, trust, or quality frameworks.
Advanced degree (MS/PhD) in ML/NLP or related field and/or published research in relevant areas.

Location

Pune, India (on-site

Please note that Zendesk can only hire candidates who are physically located and plan to work from Karnataka or Maharashtra. Please refer to the location posted on the requisition for where this role is based.

Hybrid: In this role, our hybrid experience is designed at the team level to give you a rich onsite experience packed with connection, collaboration, learning, and celebration - while also giving you flexibility to work remotely for part of the week. This role must attend our local office for part of the week. The specific in-office schedule is to be determined by the hiring manager.

The intelligent heart of customer experience

Zendesk software was built to bring a sense of calm to the chaotic world of customer service. Today we power billions of conversations with brands you know and love.

Zendesk believes in offering our people a fulfilling and inclusive experience. Our hybrid way of working, enables us to purposefully come together in person, at one of our many Zendesk offices around the world, to connect, collaborate and learn whilst also giving our people the flexibility to work remotely for part of the week.

As part of our commitment to fairness and transparency, we inform all applicants that artificial intelligence (AI) or automated decision systems may be used to screen or evaluate applications for this position, in accordance with Company guidelines and applicable law.

Zendesk is an equal opportunity employer, and we’re proud of our ongoing efforts to foster global diversity, equity, & inclusion in the workplace. Individuals seeking employment and employees at Zendesk are considered without regard to race, color, religion, national origin, age, sex, gender, gender identity, gender expression, sexual orientation, marital status, medical condition, ancestry, disability, military or veteran status, or any other characteristic protected by applicable law. We are an AA/EEO/Veterans/Disabled employer. If you are based in the United States and would like more information about your EEO rights under the law, please click here.

Zendesk endeavors to make reasonable accommodations for applicants with disabilities and disabled veterans pursuant to applicable federal and state law. If you are an individual with a disability and require a reasonable accommodation to submit this application, complete any pre-employment testing, or otherwise participate in the employee selection process, please send an e-mail to peopleandplaces@zendesk.com with your specific accommodation request.

Job Description

Summary

**What you’ll be doing **

Architect and lead delivery of cross‑product GenAI platform capabilities: LLM Proxy, model registry integrations, vendor abstraction, and cost/usage attribution.
Own the design and scaling of evaluation and benchmarking frameworks (A/B, offline, continuous regression tests) used to gate model releases.
Define company‑wide standards for safety, tone, and reasoning evaluation; drive adoption of evaluation rubrics and automated checks.
Identify systemic failure modes across products and model families; prioritize mitigations, monitoring, and retraining strategies in partnership with ML teams.
Drive platform reliability, observability, and capacity planning for LLM services; implement rate limiting, throttling, and SLA practices.
Lead efforts to enable agentic workflows and safe tool use, defining integration patterns and security boundaries.
Partner with engineering leadership, product, research, and legal/policy teams to translate risk, cost, and quality tradeoffs into platform design decisions.
Mentor senior engineers, coordinate cross‑team roadmaps, and represent the platform in technical forums.

What you bring to the role

8+ years building distributed systems and ML infrastructure with a track record delivering large, cross‑team projects to production.
Deep understanding of LLMs, inference serving patterns, vendor routing strategies, and platform design for ML workloads.
Strong system design skills: scalable architectures, service reliability engineering, capacity planning, and cost optimization.
Proficiency in Python (or comparable server-side language), Kubernetes, cloud infrastructure, and observability tooling.
Experience creating evaluation frameworks, gold‑standard datasets, and regression suites for language models.
Excellent stakeholder skills: you can synthesize product, research, and engineering constraints into pragmatic platform solutions.
Proven ability to lead technical strategy and mentor senior engineers to achieve broad adoption.

**Basic Qualifications **

BS in Computer Science, Engineering, or related field, or equivalent practical experience.
8+ years industry experience in backend, platform, or ML infrastructure engineering with major production responsibilities.
Demonstrable experience with cloud-native infrastructure (Kubernetes, AWS/GCP/Azure) and production ML/LLM systems.
Strong track record of building evaluation and monitoring for ML systems.

**Preferred Qualifications **

Experience building model registries, feature stores, or inference platforms at scale.
Background in agentic AI frameworks, workflow orchestration, or tool-using models.
Prior experience influencing company-wide ML safety, trust, or quality frameworks.
Advanced degree (MS/PhD) in ML/NLP or related field and/or published research in relevant areas.

Location

Pune, India (on-site

The intelligent heart of customer experience

Zendesk software was built to bring a sense of calm to the chaotic world of customer service. Today we power billions of conversations with brands you know and love.

Staff Machine Learning Engineer

What you'd actually do

Skills

Required

Nice to have

What the JD emphasized

Other signals

Job Description

Summary

Location

Job Description

Summary

Location