Staff Machine Learning Engineer, Core S… at Uber

What you'd actually do

Own the end‑to‑end agent architecture: agentic planning and execution loops, long-term memory, persona/voice, knowledge routing, and policy enforcement for compliant, on‑brand conversations.

Advance retrieval & reasoning: Build next-generation retrieval and reasoning pipelines, where the agent can search across different knowledge sources, apply policy-driven tools, and call structured workflows and ensure that responses are consistently grounded.

Establish evals that matter: offline rubrics, simulated scenarios, safety tests, cost/latency tradeoff suites, and LLM‑as‑judge (with calibrated human review) wired into CI/CD and experiment platforms.

Drive automation at scale: partner with Product/Design/Operations on coverage, policy alignment, localization, and rollout strategy to better customer experience and reduce cost per contact.

Mentor/principle‑lead multiple pods; set technical strategy and quality bars; coach senior engineers on agentic patterns, reliability, and experiment velocity.

Skills

Required

7+ years building production ML/AI systems
2+ years leading complex ML initiatives end‑to‑end
Deep expertise in LLM‑driven systems (inference optimization, prompt/program design, fine-tuning, distillation/LoRA, safety/guardrails, evals)
Track record of shipping customer‑facing intelligent experiences with measurable impact (A/B testing, metrics literacy)
Bachelor's Degree, or above, in Comp Science or related field

Nice to have

Agentic architectures in production (planner/executor, memory, multi‑step reasoning)
RAG over complex, policy‑heavy knowledge bases
Experience building support automation for large consumer platforms (routing, policy codification, internal tooling, co‑pilot/auto‑resolve)
Multilingual NLU/NLG (code‑switching, low‑resource languages), hallucination mitigation, safety red‑teaming, and privacy‑by‑design
Practical expertise balancing speed and reliability at scale: experiment frameworks, feature flags, canary/guarded rollouts, and clear kill‑switches

What the JD emphasized

architect, productionize, and scale an autonomous support agent

agentic architectures

LLM orchestration

evaluation

safety guardrails

reliability

cost efficiency

early stage

bias for action

creative with GenAI tools

end‑to‑end agent architecture

agentic planning and execution loops

long-term memory

knowledge routing

policy enforcement

retrieval & reasoning

search across different knowledge sources

apply policy-driven tools

call structured workflows

grounded responses

evals that matter

LLM‑as‑judge

CI/CD

experiment platforms

automation at scale

reduce cost per contact

Mentor/principle‑lead

technical strategy

quality bars

agentic patterns

reliability

experiment velocity

7+ years building production ML/AI systems

2+ years leading complex ML initiatives end‑to‑end

Deep expertise in LLM‑driven systems

inference optimization

prompt/program design

fine-tuning

distillation/LoRA

safety/guardrails

evals

shipping customer‑facing intelligent experiences

measurable impact

Agentic architectures in production

planner/executor

memory

multi‑step reasoning

RAG

support automation

large consumer platforms

routing

policy codification

internal tooling

co‑pilot/auto‑resolve

Multilingual NLU/NLG

hallucination mitigation

safety red‑teaming

privacy‑by‑design

speed

reliability at scale

experiment frameworks

feature flags

canary/guarded rollouts

clear kill‑switches

About the Role

Uber’s Customer Obsession team builds the platform and AI that powers world‑class support across mobile, web, and voice at global scale. We are now hiring a Staff ML Engineer to architect, productionize, and scale an autonomous support agent that resolves customer issues end‑to‑end. Experience with agentic architectures is a major plus. You’ll push the state of the art in GenAI for customer service—LLM orchestration, evaluation, safety guardrails, multilingual support—while holding a very high bar for reliability and cost efficiency. We are still at an early stage and value candidates with bias for action who get creative with GenAI tools to accelerate execution and experimentation.

What you will do

Own the end‑to‑end agent architecture: agentic planning and execution loops, long-term memory, persona/voice, knowledge routing, and policy enforcement for compliant, on‑brand conversations.
Advance retrieval & reasoning: Build next-generation retrieval and reasoning pipelines, where the agent can search across different knowledge sources, apply policy-driven tools, and call structured workflows and ensure that responses are consistently grounded.
Establish evals that matter: offline rubrics, simulated scenarios, safety tests, cost/latency tradeoff suites, and LLM‑as‑judge (with calibrated human review) wired into CI/CD and experiment platforms.
Drive automation at scale: partner with Product/Design/Operations on coverage, policy alignment, localization, and rollout strategy to better customer experience and reduce cost per contact.
Mentor/principle‑lead multiple pods; set technical strategy and quality bars; coach senior engineers on agentic patterns, reliability, and experiment velocity.

Basic Qualifications

7+ years building production ML/AI systems; 2+ years leading complex ML initiatives end‑to‑end.
Deep expertise in LLM‑driven systems (inference optimization, prompt/program design, fine‑tuning, distillation/LoRA, safety/guardrails, evals).
Track record of shipping customer‑facing intelligent experiences with measurable impact (A/B testing, metrics literacy).
Bachelor's Degree, or above, in Comp Science or related field.

Preferred Qualifications

Agentic architectures in production (planner/executor, memory, multi‑step reasoning) and RAG over complex, policy‑heavy knowledge bases.
Experience building support automation for large consumer platforms (routing, policy codification, internal tooling, co‑pilot/auto‑resolve).
Multilingual NLU/NLG (code‑switching, low‑resource languages), hallucination mitigation, safety red‑teaming, and privacy‑by‑design.
Practical expertise balancing speed and reliability at scale: experiment frameworks, feature flags, canary/guarded rollouts, and clear kill‑switches.

For San Francisco, CA-based roles: The base salary range for this role is USD$232,000 per year - USD$258,000 per year.

For Sunnyvale, CA-based roles: The base salary range for this role is USD$232,000 per year - USD$258,000 per year.

For all US locations, you will be eligible to participate in Uber's bonus program, and may be offered an equity award & other types of comp. All full-time employees are eligible to participate in a 401(k) plan. You will also be eligible for various benefits. More details can be found at the following link https://jobs.uber.com/en/benefits.

Uber's mission is to reimagine the way the world moves for the better. Here, bold ideas create real-world impact, challenges drive growth, and speed fuels progress. What moves us, moves the world - let's move it forward, together.

Uber is proud to be an Equal Opportunity employer. All qualified applicants will receive consideration for employment without regard to sex, gender identity, sexual orientation, race, color, religion, national origin, disability, protected Veteran status, age, or any other characteristic protected by law. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you have a disability or special need that requires accommodation, please let us know by completing this form.

Offices continue to be central to collaboration and Uber’s cultural identity. Unless formally approved to work fully remotely, Uber expects employees to spend at least half of their work time in their assigned office. For certain roles, such as those based at green-light hubs, employees are expected to be in-office for 100% of their time. Please speak with your recruiter to better understand in-office expectations for this role.