Software Engineer II - Analytics Data Engineering

Klaviyo Klaviyo · Enterprise · Boston, MA · Engineering

Software Engineer II - Analytics Data Engineering role focused on building and maintaining scalable data pipelines and core tables using PySpark, Airflow, and dbt. The role involves optimizing Spark jobs and storage for low-latency performance, treating data as a product, driving operational excellence, and partnering with AI/ML teams. There's an emphasis on using AI tools to improve the engineering workflow.

What you'd actually do

  1. Develop and maintain scalable data pipelines and core tables using PySpark, Airflow, and dbt. You will implement the foundational datasets that power our AI, ML, and Analytics products.
  2. Tune Spark jobs and storage patterns to ensure low-latency data retrieval. You will help implement materialized views and efficient partitioning strategies to support high-performance reporting at scale.
  3. Contribute to the full lifecycle of datasets. This includes defining clear data contracts with upstream teams, writing maintainable code via peer reviews, and ensuring every asset is well-documented and trusted by downstream users.
  4. Ensure the reliability of our data engine by monitoring for freshness, volume anomalies, and schema changes. You will be responsible for ensuring that when a customer loads a dashboard, the data is accurate and on time.
  5. Collaborate with Product, Engineering, and AI/ML teams to define consistent metrics that align with business goals. You will act as a bridge to ensure new features land with robust data support.

Skills

Required

  • 2+ years of experience in data engineering or a data-intensive software engineering role
  • SQL
  • Python
  • Spark (PySpark/SparkSQL)
  • cloud environment (AWS/EMR)
  • dbt
  • data contracts
  • peer reviews
  • documentation
  • monitoring for freshness, volume anomalies, and schema changes
  • collaboration with Product, Engineering, and AI/ML teams

Nice to have

  • Iceberg table maintenance and compaction
  • Terraform or other Infrastructure-as-Code tools
  • Martech or SaaS platforms dealing with high-frequency event data
  • building data products that directly power customer-facing UI components and/or support AI/ML features
  • building near real-time or streaming pipelines for user-facing analytics or monitoring
  • analytics engineering tools and practices (e.g., dbt, metrics layers, semantic models)
  • statistical modeling and machine learning

What the JD emphasized

  • low-latency performance
  • low-latency data retrieval
  • AI/ML teams