Engineering Manager - Batch Compute Infrastructure

Stripe Stripe · Fintech · India · 8125 Core Compute

Engineering Manager for Stripe's Batch Compute Infrastructure team, responsible for foundational infrastructure, tooling, and distributed systems supporting large-scale batch processing (Hadoop, Spark, Celeborn) at petabyte scale for financial, analytical, and regulatory workflows. The role involves defining strategy, leading a team, ensuring operational rigor, cross-functional collaboration, and technical stewardship of distributed systems.

What you'd actually do

  1. Define the multi-year roadmap for Stripe’s Batch Compute Infrastructure, leading complex architectural shifts and modernization.
  2. Build, mentor, and aggressively scale a high-performing team of engineers, proactively investing in their career development and fostering a culture of operational excellence.
  3. Maintain unwavering reliability for a Tier-0 infrastructure processing tens of thousands of daily workloads, proactively mitigating risks and managing complex on-call telemetry.
  4. Collaborate deeply with data platform teams, finance, and user groups to define compute efficiency metrics, execute massive-scale cost optimization strategies, and guarantee compliance with global financial regulations.
  5. Provide technical guidance in architecture reviews, evaluating critical cost, performance, and reliability trade-offs in distributed systems design involving Hadoop, Spark, AWS cloud primitives, and modern metastores.

Skills

Required

  • 10+ years of professional software development and engineering experience
  • 3+ years of direct engineering management experience
  • building, scaling, and maintaining large-scale distributed data systems or Tier-0 infrastructure using open-source tools (e.g., Hadoop, Spark, Celeborn, Airflow, Kafka)
  • driving significant infrastructure efficiency
  • managing capacity planning
  • making data-driven cost-performance trade-offs
  • working effectively in highly cross-functional, global organizations

Nice to have

  • managing remote or geographically distributed engineering teams
  • managing a massive fleet of Linux servers, on-premise Hadoop clusters, and modern cloud data architectures (e.g., AWS S3, Graviton)
  • navigating strategic ambiguity and deliver complex, multi-quarter infrastructural projects from inception to completion
  • Deep empathy for internal data users with a passion for building robust developer tooling and abstractions

What the JD emphasized

  • global financial regulations
  • compliance with global financial regulations