Staff Engineer - Core Infrastructure

Uber Uber · Consumer · New York, NY +2 · Engineering

Staff Engineer on the Core Infrastructure Platform team at Uber, focusing on the architecture and evolution of systems powering Uber's global business. The role involves leading the development of backend solutions for cloud-native, secure, reliable, and hyperscale-efficient ecosystems, supporting massive compute demands for Generative AI and Autonomous Vehicle data. Responsibilities include architectural evolution, driving efficiency, modernizing for multi-cloud, integrating AIOps & Automation, championing security, and influencing cross-functional teams to support large-scale AI/ML workloads.

What you'd actually do

  1. Lead Architectural Evolution: Identify architectural gaps in our compute and networking stacks; lead projects from ideation to global execution to fill those gaps (e.g., migrating to standardized CNI/Envoy or implementing shared GPU pools).
  2. Drive Efficiency at Scale: Design and implement solutions to increase fleet-wide CPU utilization (targeting 40%+) and accelerate ARM adoption to optimize Uber’s unit costs.
  3. Modernize for Multi-Cloud: Lead the technical strategy for "Thrive in Cloud," ensuring our foundations are resilient, active-active across providers, and support rapid regional failover.
  4. Integrate AIOps & Automation: Drive the development of "Agentic" infrastructure tools (e.g., Project Eve for network quality) to automate 80% of alert triaging and incident response.
  5. Champion "Security by Design": Ensure 100% service-to-service authorization and zero-trust networking are baked into the core fabric of our container and networking platforms.

Skills

Required

  • 8+ years of full-time Software Engineering work experience
  • Expertise in Systems Languages: Deep proficiency in Go, Java, or C++
  • Distributed Systems Mastery: Demonstrated experience designing and productionizing large-scale, high-availability infrastructure services.
  • Execution Excellence: A track record of leading complex, multi-quarter technical initiatives from design to fleet-wide rollout.

Nice to have

  • Cloud-Native Depth: Deep knowledge of Kubernetes internals, container runtimes (CRI), and networking (CNI/Envoy).
  • Infrastructure Strategy: Prior experience with hybrid-cloud or multi-cloud migrations and cost-optimization at scale.
  • Linux & Kernel Knowledge: Understanding of operating systems, Linux kernel performance tuning, or eBPF.
  • AI/ML Infrastructure: Experience building or managing compute platforms tailored for GPU scheduling and large-scale model training.
  • Contributions to open-source infrastructure projects or a history of presenting at major technical conferences.

What the JD emphasized

  • massive compute demands of Generative AI and Autonomous Vehicle data
  • support 300x larger ranking models and L4 AV data ingestion
  • GPU scheduling and large-scale model training

Other signals

  • powering Uber's global business
  • cloud-native by default
  • hyperscale-efficient
  • 1M+ concurrent trips
  • massive compute demands of Generative AI and Autonomous Vehicle data
  • orchestration, foundation, and networking layers
  • diverse variety of workloads (stateless, batch, streaming, and ML)
  • support 300x larger ranking models and L4 AV data ingestion
  • GPU scheduling and large-scale model training