Software Engineer, Data Platform

Notion Notion · Enterprise · San Francisco, CA · Engineering

This role is for a Software Engineer on the Data Platform team at Notion. The primary focus is on building and operating the core data infrastructure, including the data lakehouse, critical data pipelines, and services. The role also involves advancing security features like Enterprise Key Management (EKM) and encryption, improving data access, auditability, and residency, and driving reliability and observability for the data stack. The engineer will optimize large-scale performance and cost, and contribute to shaping the platform roadmap to support novel large volume, agentic use cases and enterprise customers.

What you'd actually do

  1. Design and evolve the data lakehouse
  2. Own critical data pipelines and services
  3. Advance EKM and encryption-by-design
  4. Improve data access, auditability, and residency
  5. Drive reliability and observability

Skills

Required

  • 2+ years building and operating data platforms or large-scale infrastructure for SaaS or similar environments
  • Strong skills in at least one of Python, Scala, or Typescript
  • comfortable working with SQL for analytics and data modeling
  • Hands-on experience with Spark or similar distributed processing systems, including debugging and performance tuning
  • Experience with Kafka or equivalent streaming systems
  • familiarity with CDC/ingestion patterns (e.g., Debezium, Fivetran, custom connectors)
  • Experience with data lakes and table formats (Iceberg, Hudi, or Delta) and/or data catalogs and schema evolution
  • Comfortable owning services and pipelines in production, including on-call, incident response, and reliability improvements

Nice to have

  • Experience working in an applied data platform setting, such as Trust and Safety, and/or directly with enterprise customers or on features like data residency, analytics product, EKM, or compliance-driven auditing
  • Practical understanding of access control, encryption at rest/in transit, and auditing as they apply to data platforms
  • Prior work on Databricks, Unity Catalog, Lake Formation, or similar catalog/governance systems
  • Experience designing or improving observability for data platforms (e.g., Honeycomb, OpenTelemetry, metrics/trace-heavy debugging)

What the JD emphasized

  • stringent security, privacy, and compliance requirements
  • novel agentic data use cases
  • large volume, agentic use cases
  • enterprise customers