Senior Staff Software Engineer: Data & Storage Platform

Uber Uber · Consumer · Seattle, WA +2 · Engineering

Senior Staff Software Engineer role focused on architecting and building Uber's Next-Generation Data Intelligence Platform, which includes unifying batch, streaming, and AI compute, revolutionizing storage and catalog management, and operationalizing agentic data intelligence. The role involves evolving large-scale persistence layers and integrating compute fabrics with vector databases and model-serving platforms.

What you'd actually do

  1. Architect the Multi-Modal Fabric: Unify batch, streaming, and AI compute into one intelligent fabric, enabling real-time insights and trustworthy AI agents at a global scale.
  2. Revolutionize Storage & Catalog: Drive the architecture for a unified catalog and metadata management service for unstructured data, leveraging native cloud object store capabilities.
  3. Operationalize AI Intelligence: Partner with teams like QueryCopilot and DataIQ to bridge human validation with autonomous reasoning through agentic workflows.
  4. Lead Storage Modernization: Evolve our massive-scale persistence layers—including Docstore (Transactional Distributed Storage) and Distributed MySQL—to increase resiliency and operational overhead.
  5. Open Source & Act as a force multiplier by contributing to the community (Hudi, Iceberg, Presto).

Skills

Required

  • Designing and operating world-class distributed data and storage systems
  • Deep expertise in Batch & Object Storage (HDFS, Cloud Object Storage (S3/GCS/OCI), and Blobstore metadata management)
  • Practical experience with Apache Hudi or Apache Iceberg for lakehouse architectures
  • Experience with distributed transactional storage (e.g., Docstore, Google Spanner, TiDB)
  • Experience with NoSQL & Cache (Cassandra, Redis, and high-throughput Key-Value stores)
  • Deep understanding of how compute fabrics (Spark, Flink, Ray) integrate with vector databases and model-serving platforms
  • Architect-level knowledge of Presto, Trino, or Hive for large-scale analytical processing
  • Expert-level command of Java, Go, Scala, or C++ with a focus on performance tuning and distributed consensus

Nice to have

  • Designing AI infrastructure, including RAG (Retrieval-Augmented Generation) systems and high-bandwidth data loading for GPUs
  • Hands-on experience with Sharded/Distributed MySQL (Vitess) and managing large-scale tabular data
  • Ability to build portable data solutions across OCI and GCP, optimizing for resource efficiency and intelligent scheduling
  • Expertise in building observability, data freshness, and quality frameworks for Tier-0 mission-critical services
  • Proven ability to lead platform modernization, mentor Staff-level engineers, and influence long-term technical strategy across multiple organizations

What the JD emphasized

  • 14+ Years of Engineering Excellence
  • Mastery of Storage Internals
  • Deep understanding of how compute fabrics (Spark, Flink, Ray) integrate with vector databases and model-serving platforms
  • Architect-level knowledge of Presto, Trino, or Hive for large-scale analytical processing
  • Expert-level command of Java, Go, Scala, or C++ with a focus on performance tuning and distributed consensus

Other signals

  • architecting and building next-generation data intelligence platform
  • handling exabytes of data
  • operationalizing agentic data intelligence
  • evolving massive-scale persistence layers
  • integrating compute fabrics with vector databases and model-serving platforms