Staff Software Engineer - Distributed Data Systems

Databricks Databricks · Data AI · San Francisco, CA · Engineering - Pipeline

Databricks is seeking a Staff Software Engineer to work on their distributed data systems, focusing on building and running their data and AI infrastructure platform. The role involves developing next-generation distributed data storage and processing systems, with projects including Apache Spark, Data Plane Storage, Delta Lake, Delta Pipelines, and Performance Engineering. The ideal candidate has extensive experience in distributed systems, databases, and big data systems.

What you'd actually do

  1. Develop the de facto open source standard framework for big data.
  2. Deliver reliable and high performance services and client libraries for storing and accessing humongous amount of data on cloud storage backends, e.g., AWS S3, Azure Blob Store.
  3. A storage management system that combines the scale and cost-efficiency of data lakes, the performance and reliability of a data warehouse, and the low latency of streaming.
  4. It's difficult to manage even a single data engineering pipeline. The goal of the Delta Pipelines project is to make it simple and possible to orchestrate and operate tens of thousands of data pipelines.
  5. Build the next generation query optimizer and execution engine that's fast, tuning free, scalable, and robust.

Skills

Required

  • BS in Computer Science, related technical field or equivalent practical experience.
  • 8+ years of production level experience in either Java, Scala or C++.
  • Strong foundation in algorithms and data structures and their real-world use cases.
  • Experience with distributed systems, databases, and big data systems (Apache Spark™, Hadoop).

Nice to have

  • MS or PhD in databases, distributed systems.

What the JD emphasized

  • critical to making customers successful on our platform
  • 8+ years of production level experience