Senior Software Engineer (backend) - AI Platform Team

Uber Uber · Consumer · Seattle, WA +2 · Engineering

This role is for a Senior Software Engineer on the Ai Platform Team, focusing on building and optimizing Uber's Cloud-Native Data Platform. The engineer will own critical systems like Distributed MySQL, Hudi-based Data Lakes, and storage layers, ensuring high availability and performance. A key responsibility involves designing high-performance data pipelines to support the intense IO demands of GPU-based model training for AI/ML teams. The role requires expertise in distributed systems, storage technologies, and high-performance coding.

What you'd actually do

  1. Lead the design and implementation of major features for Uber’s storage and data platforms (e.g., Docstore, Pinot, or OpenSearch).
  2. Build and optimize services that leverage GCP and OCI Object Storage, focusing on high-throughput metadata management and S3-compatible API support.
  3. Drive efficiency across our HDFS and Blobstore layers, using table formats like Apache Hudi or Iceberg to improve data freshness and reduce cost.
  4. Work with AI teams to design high-performance data pipelines, ensuring our storage layers can handle the intense IO demands of GPU-based model training.
  5. Ensure 99.99% availability for your services. You will lead root-cause analyses (RCAs), improve observability, and mentor L3/L4 engineers on best practices for distributed systems.

Skills

Required

  • 5+ Years of Engineering Experience
  • Proven track record of building and maintaining large-scale distributed systems
  • Practical, hands-on experience with Distributed MySQL, Cassandra, or Redis
  • Practical, hands-on experience with HDFS, S3/GCS, and Metadata services
  • Expert-level proficiency in Java, Go, or C++
  • Strong focus on concurrency, memory management, and performance tuning
  • Experience with large-scale analytical engines like Presto, Hive, or Trino

Nice to have

  • Experience with Apache Hudi, Iceberg, or Delta Lake for optimizing "Big Data" storage
  • Deep familiarity with OCI or GCP and strategies for resource efficiency
  • Understanding how data storage interacts with ML frameworks like Ray or PyTorch
  • Active participation in community projects like Apache Pinot, Kafka, or Flink
  • Ability to apply research-level concepts (partnering with CMU, Berkeley, or MIT) to solve real-world distributed consensus or indexing challenges

What the JD emphasized

  • Cloud-Native Data Platform
  • Agentic AI
  • Distributed MySQL
  • Hudi-based Data Lakes
  • GPU-based model training
  • large-scale distributed systems
  • Distributed Systems
  • Google Spanner
  • TiDB
  • Java, Go, or C++
  • Presto, Hive, or Trino
  • Apache Hudi, Iceberg, or Delta Lake
  • OCI or GCP
  • Ray or PyTorch
  • Apache Pinot, Kafka, or Flink
  • CMU, Berkeley, or MIT

Other signals

  • AI/ML Integration
  • GPU-based model training
  • Cloud-Native Data Platform
  • Distributed systems