Staff Software Engineer - Distributed Data Systems

Databricks Databricks · Data AI · San Francisco, CA · Engineering - Pipeline

Databricks is seeking a Staff Software Engineer to build the next generation of distributed data storage and processing systems. This role involves working on large-scale software platforms, including Apache Spark, Delta Lake, and Delta Pipelines, to support diverse workloads ranging from ETL to data science. The ideal candidate will have extensive experience with distributed systems, databases, and big data technologies.

What you'd actually do

  1. Develop the de facto open source standard framework for big data.
  2. Deliver reliable and high performance services and client libraries for storing and accessing humongous amount of data on cloud storage backends, e.g., AWS S3, Azure Blob Store.
  3. A storage management system that combines the scale and cost-efficiency of data lakes, the performance and reliability of a data warehouse, and the low latency of streaming.
  4. It's difficult to manage even a single data engineering pipeline. The goal of the Delta Pipelines project is to make it simple and possible to orchestrate and operate tens of thousands of data pipelines.
  5. Build the next generation query optimizer and execution engine that's fast, tuning free, scalable, and robust.

Skills

Required

  • BS in Computer Science, related technical field or equivalent practical experience.
  • 8+ years of production level experience in either Java, Scala or C++.
  • Strong foundation in algorithms and data structures and their real-world use cases.
  • Experience with distributed systems, databases, and big data systems (Apache Spark™, Hadoop).

Nice to have

  • MS or PhD in databases, distributed systems.

What the JD emphasized

  • 8+ years of production level experience in either Java, Scala or C++
  • Experience with distributed systems, databases, and big data systems (Apache Spark™, Hadoop)