Staff Machine Learning Engineer

Zendesk Zendesk · Enterprise · Pune, India

Staff ML Engineer to shape the architecture and strategy of Zendesk’s GenAI platform, leading cross-functional efforts to standardize evaluation, access, observability, and orchestration for LLMs across product lines, ensuring AI experiences are safe, performant, and trustworthy.

What you'd actually do

  1. Architect and lead delivery of cross‑product GenAI platform capabilities: LLM Proxy, model registry integrations, vendor abstraction, and cost/usage attribution.
  2. Own the design and scaling of evaluation and benchmarking frameworks (A/B, offline, continuous regression tests) used to gate model releases.
  3. Define company‑wide standards for safety, tone, and reasoning evaluation; drive adoption of evaluation rubrics and automated checks.
  4. Identify systemic failure modes across products and model families; prioritize mitigations, monitoring, and retraining strategies in partnership with ML teams.
  5. Drive platform reliability, observability, and capacity planning for LLM services; implement rate limiting, throttling, and SLA practices.

Skills

Required

  • Python
  • Kubernetes
  • cloud infrastructure
  • observability tooling
  • distributed systems
  • ML infrastructure
  • LLMs
  • inference serving patterns
  • system design
  • scalable architectures
  • service reliability engineering
  • capacity planning
  • cost optimization
  • evaluation frameworks
  • gold-standard datasets
  • regression suites for language models
  • stakeholder management
  • technical strategy leadership
  • mentoring senior engineers

Nice to have

  • model registries
  • feature stores
  • inference platforms at scale
  • agentic AI frameworks
  • workflow orchestration
  • tool-using models
  • ML safety
  • trust frameworks
  • quality frameworks
  • MS/PhD in ML/NLP
  • published research

What the JD emphasized

  • track record delivering large, cross‑team projects to production
  • Deep understanding of LLMs, inference serving patterns, vendor routing strategies, and platform design for ML workloads.
  • Strong system design skills: scalable architectures, service reliability engineering, capacity planning, and cost optimization.
  • Experience creating evaluation frameworks, gold‑standard datasets, and regression suites for language models.
  • Proven ability to lead technical strategy and mentor senior engineers to achieve broad adoption.

Other signals

  • LLM Proxy
  • evaluation and benchmarking frameworks
  • safety, tone, and reasoning evaluation
  • agentic workflows and safe tool use
  • LLM services