Staff Machine Learning Engineer, Core Services Eng (genai)

Uber Uber · Consumer · San Francisco, CA +1 · Engineering

Staff ML Engineer to architect, productionize, and scale an autonomous support agent that resolves customer issues end-to-end, focusing on LLM orchestration, evaluation, safety guardrails, and cost efficiency.

What you'd actually do

  1. Own the end‑to‑end agent architecture: agentic planning and execution loops, long-term memory, persona/voice, knowledge routing, and policy enforcement for compliant, on‑brand conversations.
  2. Advance retrieval & reasoning: Build next-generation retrieval and reasoning pipelines, where the agent can search across different knowledge sources, apply policy-driven tools, and call structured workflows and ensure that responses are consistently grounded.
  3. Establish evals that matter: offline rubrics, simulated scenarios, safety tests, cost/latency tradeoff suites, and LLM‑as‑judge (with calibrated human review) wired into CI/CD and experiment platforms.
  4. Drive automation at scale: partner with Product/Design/Operations on coverage, policy alignment, localization, and rollout strategy to better customer experience and reduce cost per contact.
  5. Mentor/principle‑lead multiple pods; set technical strategy and quality bars; coach senior engineers on agentic patterns, reliability, and experiment velocity.

Skills

Required

  • 7+ years building production ML/AI systems
  • 2+ years leading complex ML initiatives end‑to‑end
  • Deep expertise in LLM‑driven systems (inference optimization, prompt/program design, fine-tuning, distillation/LoRA, safety/guardrails, evals)
  • Track record of shipping customer‑facing intelligent experiences with measurable impact (A/B testing, metrics literacy)
  • Bachelor's Degree, or above, in Comp Science or related field

Nice to have

  • Agentic architectures in production (planner/executor, memory, multi‑step reasoning)
  • RAG over complex, policy‑heavy knowledge bases
  • Experience building support automation for large consumer platforms (routing, policy codification, internal tooling, co‑pilot/auto‑resolve)
  • Multilingual NLU/NLG (code‑switching, low‑resource languages), hallucination mitigation, safety red‑teaming, and privacy‑by‑design
  • Practical expertise balancing speed and reliability at scale: experiment frameworks, feature flags, canary/guarded rollouts, and clear kill‑switches

What the JD emphasized

  • architect, productionize, and scale an autonomous support agent
  • agentic architectures
  • LLM orchestration
  • evaluation
  • safety guardrails
  • reliability
  • cost efficiency
  • early stage
  • bias for action
  • creative with GenAI tools
  • end‑to‑end agent architecture
  • agentic planning and execution loops
  • long-term memory
  • knowledge routing
  • policy enforcement
  • retrieval & reasoning
  • search across different knowledge sources
  • apply policy-driven tools
  • call structured workflows
  • grounded responses
  • evals that matter
  • LLM‑as‑judge
  • CI/CD
  • experiment platforms
  • automation at scale
  • reduce cost per contact
  • Mentor/principle‑lead
  • technical strategy
  • quality bars
  • agentic patterns
  • reliability
  • experiment velocity
  • 7+ years building production ML/AI systems
  • 2+ years leading complex ML initiatives end‑to‑end
  • Deep expertise in LLM‑driven systems
  • inference optimization
  • prompt/program design
  • fine-tuning
  • distillation/LoRA
  • safety/guardrails
  • evals
  • shipping customer‑facing intelligent experiences
  • measurable impact
  • Agentic architectures in production
  • planner/executor
  • memory
  • multi‑step reasoning
  • RAG
  • support automation
  • large consumer platforms
  • routing
  • policy codification
  • internal tooling
  • co‑pilot/auto‑resolve
  • Multilingual NLU/NLG
  • hallucination mitigation
  • safety red‑teaming
  • privacy‑by‑design
  • speed
  • reliability at scale
  • experiment frameworks
  • feature flags
  • canary/guarded rollouts
  • clear kill‑switches

Other signals

  • autonomous support agent
  • LLM orchestration
  • evaluation
  • safety guardrails
  • reliability
  • cost efficiency