Senior Software Engineer - Distributed Data Systems

Databricks Databricks · Data AI · Seattle, WA +1 · Engineering - Pipeline

Software Engineer on the Runtime team at Databricks, building next-generation distributed data storage and processing systems for diverse workloads including ETL and data science. Focus on Apache Spark, Data Plane Storage, Delta Lake, Delta Pipelines, and Performance Engineering.

What you'd actually do

  1. Develop the de facto open source standard framework for big data.
  2. Provide reliable and high performance services and client libraries for storing and accessing humongous amount of data on cloud storage backends, e.g., AWS S3, Azure Blob Store.
  3. A storage management system that combines the scale and cost-efficiency of data lakes, the performance and reliability of a data warehouse, and the low latency of streaming.
  4. It's difficult to manage even a single data engineering pipeline. The goal of the Delta Pipelines project is to make it simple and possible to orchestrate and operate tens of thousands of data pipelines.
  5. Build the next generation query optimizer and execution engine that's fast, tuning free, scalable, and robust.

Skills

Required

  • BS (or higher) in Computer Science, related technical field or equivalent practical experience.
  • 5+ years of production level experience in either Java, Scala or C++.
  • Strong foundation in algorithms and data structures and their real-world use cases.
  • Experience with distributed systems, databases, and big data systems (Apache Spark, Hadoop).

Nice to have

  • Comfortable working towards a multi-year vision with incremental deliverables.
  • Motivated by delivering customer value and impact.