Senior Software Engineer- Open Source Analytics

Snowflake Snowflake · Data AI · WA-Bellevue, United States · Engineering

Snowflake is seeking a Senior Software Engineer for their Open Source Analytics team. The role involves building and evolving an open and interoperable data lake ecosystem, focusing on projects like Apache Iceberg and Apache Polaris. Responsibilities include designing and implementing features, collaborating with the open-source community, architecting systems that integrate open-source technologies with Snowflake, and contributing to managed services and tooling for data lake maintenance. The ideal candidate has strong programming skills in Java, Scala, or C++, experience with distributed systems, and familiarity with open-source data lake formats and cloud-native services.

What you'd actually do

  1. Pioneer new and innovative technical capabilities in the Open Source Analytics community. You will define and build next-generation capabilities on top of critical lakehouse building blocks like interoperable table formats, data catalogs, file formats, and query engines.
  2. Design and implement features and enhancements for Apache Iceberg and Apache Polaris focusing on scalability, performance and usability such as Iceberg DML/DDL transactions, schema evolution, partitioning, time travel, and more.
  3. Collaborate with the Open source community by contributing code, participating in discussions and reviewing pull requests to ensure high quality contributions.
  4. Architect and build systems that integrate open source technologies seamlessly with Snowflake - enabling our customers to build and deploy massive data lake architectures across platforms and across cloud providers.
  5. Collaborate with Snowflake’s open-source team and the Apache Iceberg community to contribute new features and enhance the Iceberg table format and REST specification.

Skills

Required

  • 5+ years of experience designing and building scalable, distributed systems.
  • Strong programming skills in Java, Scala, or C++ with an emphasis on performance and reliability.
  • Deep understanding of distributed transaction processing, concurrency control, and high-performance query engines.
  • Experience with open-source data lake formats (e.g., Apache Iceberg, Parquet, Delta) and the challenges associated with multi-engine interoperability.
  • Experience building cloud-native services and working with public cloud providers like AWS, Azure, or GCP.
  • A passion for open-source software and community engagement, particularly in the data ecosystem.
  • Familiarity with data governance, security, and access control models in distributed data systems.

Nice to have

  • Contributing to open-source projects, especially in the data infrastructure space.
  • Designing or implementing REST APIs, particularly in the context of distributed systems.
  • Managing large-scale data lakes or data catalogs in production environments.
  • Working on highly-performant and scalable query engines such as Spark, Flink, or Trino.

What the JD emphasized

  • open and interoperable data lake ecosystem
  • Open Source Analytics
  • Apache Iceberg
  • Apache Polaris