Senior Software Engineer - Distributed Data Systems

Databricks Databricks · Data AI · San Francisco, CA · Engineering - Pipeline

Databricks is seeking a Senior Software Engineer to build the next generation of distributed data storage and processing systems. This role involves working on core components like Apache Spark, Data Plane Storage, Delta Lake, and Delta Pipelines, focusing on performance, scalability, and reliability for diverse data workloads including ETL and data science.

What you'd actually do

  1. Develop the de facto open source standard framework for big data.
  2. Provide reliable and high performance services and client libraries for storing and accessing humongous amount of data on cloud storage backends, e.g., AWS S3, Azure Blob Store.
  3. A storage management system that combines the scale and cost-efficiency of data lakes, the performance and reliability of a data warehouse, and the low latency of streaming.
  4. It's difficult to manage even a single data engineering pipeline. The goal of the Delta Pipelines project is to make it simple and possible to orchestrate and operate tens of thousands of data pipelines.
  5. Build the next generation query optimizer and execution engine that's fast, tuning free, scalable, and robust.

Skills

Required

  • BS (or higher) in Computer Science, related technical field or equivalent practical experience.
  • 5+ years of production level experience in either Java, Scala or C++.
  • Strong foundation in algorithms and data structures and their real-world use cases.
  • Experience with distributed systems, databases, and big data systems (Apache Spark, Hadoop).

Nice to have

  • Comfortable working towards a multi-year vision with incremental deliverables.
  • Motivated by delivering customer value and impact.