Staff Software Engineer - Distributed Data Systems

Databricks Databricks · Data AI · Seattle, WA +1 · Engineering - Pipeline

Databricks is seeking a Staff Software Engineer to build the next generation of distributed data storage and processing systems. This role involves working on projects like Apache Spark, Delta Lake, and Delta Pipelines, focusing on performance, reliability, and scalability for diverse workloads including ETL and data science. The ideal candidate has extensive experience in distributed systems, databases, and big data technologies.

What you'd actually do

  1. Develop the de facto open source standard framework for big data.
  2. Provide reliable and high performance services and client libraries for storing and accessing humongous amount of data on cloud storage backends, e.g., AWS S3, Azure Blob Store.
  3. A storage management system that combines the scale and cost-efficiency of data lakes, the performance and reliability of a data warehouse, and the low latency of streaming.
  4. It's difficult to manage even a single data engineering pipeline. The goal of the Delta Pipelines project is to make it simple and possible to orchestrate and operate tens of thousands of data pipelines.
  5. Build the next generation query optimizer and execution engine that's fast, tuning free, scalable, and robust.

Skills

Required

  • BS (or higher) in Computer Science, related technical field or equivalent practical experience
  • 8+ years of production level experience in either Java, Scala or C++
  • Strong foundation in algorithms and data structures and their real-world use cases
  • Experience with distributed systems, databases, and big data systems (Apache Spark, Hadoop)

What the JD emphasized

  • 8+ years of production level experience in either Java, Scala or C++
  • Experience with distributed systems, databases, and big data systems (Apache Spark, Hadoop)