Senior Software Engineer, Data Governance & Foundations

Instacart Instacart · Consumer · United States · Remote · Software Engineering

Instacart is seeking a Senior Software Engineer for their Data Governance & Foundations team. This role focuses on building and operating core systems for the company's data ecosystem, including a large-scale data lakehouse, ingestion, stream processing, and self-serve tooling. The engineer will define multi-year architecture roadmaps, own platform initiatives, partner with vendors, embed governance and compliance controls, optimize infrastructure spend, and mentor other engineers. The role requires 5+ years of experience in data infrastructure or distributed systems, familiarity with modern data lakehouse architectures, event-driven infrastructure (Kafka, Flink), and strong communication skills. Experience with data governance frameworks and FinOps is preferred.

What you'd actually do

  1. Define and drive multi-year architecture roadmaps for large-scale data ingestion and processing infrastructure, setting technical direction that balances reliability, scalability, and cost.
  2. Own end-to-end platform initiatives — from build vs. buy decisions and migration design through production rollout and risk management — across Kafka-based streaming and Postgres-based systems.
  3. Partner with vendors (Snowflake, Databricks, Confluent) on technical integration, contract evaluation, and TCO modeling to inform infrastructure investment decisions.
  4. Collaborate with various teams to embed governance and compliance controls (SOX, CPRA, GDPR) directly into platform architecture and data lifecycle management.
  5. Optimize infrastructure spend at scale: identify cost reduction opportunities across compute, storage, and pipeline efficiency; manage multi-million dollar infrastructure budgets.

Skills

Required

  • 5+ years of software engineering focused on data infrastructure or distributed systems at scale
  • Experience in modern data lakehouse architectures and open table formats — Apache Iceberg, Delta Lake, Hudi — with strong understanding of compute/storage trade-offs.
  • Hands-on experience with distributed query and compute systems (Trino, Spark, ClickHouse) including performance tuning and production reliability work.
  • Proven depth in event-driven infrastructure: Kafka for high-throughput data ingestion and Flink (or equivalent) for stream processing at scale.
  • Track record owning and executing major platform transitions, including migration design, phased rollout, and risk management under production constraints.
  • Experience building business cases for infrastructure investments: cost-benefit analysis, TCO modeling, and presenting recommendations to leadership.
  • Exceptional written technical communication — clear architecture docs, strategy memos, and cross-team proposals that drive decisions and alignment.
  • Strong ownership and comfort operating in ambiguity; ability to drive large, multi-team initiatives from concept to production with organizational influence.

Nice to have

  • Familiarity with data governance and compliance frameworks (SOX, CPRA, GDPR) and experience designing governance controls into platform architecture.
  • Experience with FinOps and data platform cost optimization, including managing large infrastructure budgets and negotiating enterprise vendor contracts.
  • Deep SQL expertise and strong proficiency in Python or Scala for systems-level work.
  • Experience with orchestration (Apache Airflow) and transformation pipelines (dbt) in large-scale production environments.
  • Bachelor's, Master's, or PhD in Computer Science, Computer Engineering, Electrical Engineering, or equivalent practical experience.

What the JD emphasized

  • multi-year architecture roadmaps
  • end-to-end platform initiatives
  • build vs. buy decisions
  • migration design
  • production rollout
  • risk management
  • technical integration
  • contract evaluation
  • TCO modeling
  • governance and compliance controls
  • data lifecycle management
  • infrastructure spend at scale
  • cost reduction opportunities
  • pipeline efficiency
  • multi-million dollar infrastructure budgets
  • compelling architecture documents
  • strategy memos
  • proposals
  • engineering leadership
  • senior stakeholders
  • 5+ years of software engineering focused on data infrastructure or distributed systems at scale
  • high-growth, data-intensive environment
  • modern data lakehouse architectures
  • open table formats
  • compute/storage trade-offs
  • distributed query and compute systems
  • performance tuning
  • production reliability work
  • event-driven infrastructure
  • high-throughput data ingestion
  • stream processing at scale
  • Track record owning and executing major platform transitions
  • migration design
  • phased rollout
  • risk management under production constraints
  • building business cases for infrastructure investments
  • cost-benefit analysis
  • TCO modeling
  • presenting recommendations to leadership
  • Exceptional written technical communication
  • clear architecture docs
  • strategy memos
  • cross-team proposals
  • drive decisions and alignment
  • Strong ownership
  • comfort operating in ambiguity
  • drive large, multi-team initiatives from concept to production
  • organizational influence
  • data governance and compliance frameworks
  • designing governance controls into platform architecture
  • FinOps
  • data platform cost optimization
  • managing large infrastructure budgets
  • negotiating enterprise vendor contracts