Senior Software Engineer, Data Ingestion Platform

Block Block · Fintech · CA · Remote · 10409 Engineering - AIDA

Senior Software Engineer role focused on building and operating data ingestion platforms for Block's AI, Data & Analytics organization. The role involves designing and developing next-generation data ingestion infrastructure, including Kafka Iceberg connectors and database replication pipelines, to ensure reliable data availability for analytics, machine learning, and AI initiatives. Responsibilities include modernizing the CDC platform, consolidating ingestion paths, and implementing self-service tooling and observability features. Experience with streaming data systems, CDC, data lakehouse architectures, and modern data formats is required.

What you'd actually do

  1. Design, build, and operate scalable data replication and ingestion pipelines that move data from production databases, event streams, and third-party sources into Block's Lakehouse.
  2. Develop and enhance Kafka Iceberg connectors and data loading frameworks, enabling reliable, low-latency data delivery to Snowflake and Databricks.
  3. Drive the modernization of Block's CDC platform — evaluating and implementing next-generation approaches for database replication, including cloud-native alternatives, and Iceberg-based ingestion patterns.
  4. Build self-service tooling and observability features that empower internal teams to onboard, monitor, and troubleshoot their own data pipelines with minimal support.
  5. Collaborate with data engineering, platform infrastructure, and product teams to define data contracts, improve service encapsulation, and reduce tight coupling between operational databases and analytics consumers.

Skills

Required

  • Java
  • Python
  • Scala
  • Go
  • Apache Kafka
  • Kafka Connect
  • Debezium
  • Change Data Capture (CDC)
  • Database replication
  • Data lakehouse architectures
  • Apache Iceberg
  • Delta Lake
  • AWS
  • GCP
  • Azure
  • Terraform

Nice to have

  • Snowflake
  • Databricks
  • Apache Spark
  • Apache Airflow

What the JD emphasized

  • 8+ years of experience in software engineering or data platform development, with a focus on building scalable data systems or distributed infrastructure.
  • Strong programming proficiency in languages such as Java, Python, Scala, or Go, with experience developing data frameworks, libraries, or services.
  • Hands-on experience with streaming data systems and technologies such as Apache Kafka, Kafka Connect, or similar distributed messaging platforms.
  • Solid understanding of Change Data Capture (CDC), database replication patterns, and data lake or Lakehouse architectures.
  • Experience with modern data storage formats and table formats such as Apache Iceberg or Delta Lake.
  • Experience with cloud-based data ecosystems (AWS, GCP, or Azure) and infrastructure-as-code tools.
  • Design and implement solutions for PII detection, masking, and privacy-compliant data handling within ingestion pipelines, ensuring sensitive data is properly classified, protected, and governed in accordance with Block's privacy policies and regulatory requirements (e.g., GDPR, CCPA).