Director of Engineering (data Infrastructure)

Databricks Databricks · Data AI · Bangalore, India · Executive Engineering - Pipeline

This role is for a Director of Engineering focused on building and leading the data infrastructure organization at Databricks. The primary focus is on ensuring the reliability, correctness, and scalability of systems that handle petabytes of data and billions of daily transactions, particularly for billing and monetization. The role involves architecting foundational teams, defining infrastructure vision, and ensuring operational resilience and disaster recovery for critical business systems.

What you'd actually do

  1. Deliver the infrastructure vision for systems processing billions in daily billing transactions with zero tolerance for error, building disaster recovery that's provably reliable, testing frameworks that catch what production sees, correctness systems that make billing errors structurally impossible, and observability that predicts failures before they happen
  2. Build Bengaluru's data infrastructure organization by establishing it as the destination for India's top infrastructure talent, hiring multiple engineering managers who become force multipliers, and creating a culture where solving hard distributed systems problems at scale is the daily work
  3. Own business-critical systems operating 24/7/365 across 100+ regions where even 99.9% uptime means hours of customer pain, driving reliability improvements that prevent millions in revenue loss while eliminating operational toil through frameworks that make systems self-healing, self-tuning, and self-documenting
  4. Ship platforms that compound engineering leverage across Databricks: correctness frameworks that catch billing errors before customers do, deployment automation that makes regional expansion push-button, data integration systems that process petabyte-scale flows without human intervention, and testing infrastructure where comprehensive coverage is automatic, not heroic
  5. Position infrastructure as product by treating internal engineering teams as customers with SLAs, measuring adoption and satisfaction, iterating based on feedback, and demonstrating that every dollar invested in infrastructure returns multiplicative gains in product velocity, reliability improvements, or cost reductions

Skills

Required

  • 14+ years in distributed systems engineering
  • 6+ years leading infrastructure organizations
  • 4+ years managing managers
  • Experience building 99.999%+ reliable systems
  • Proven ability to scale infrastructure organizations in high-growth environments
  • Communication skills to make complex infrastructure decisions legible to executives

Nice to have

  • Technical depth across petabyte-scale data pipelines and distributed systems reliability
  • Track record defining multi-year infrastructure vision and translating it into sequential deliverables
  • Established practices for SLOs/SLIs, chaos engineering, disaster recovery, and sophisticated observability
  • Developed engineering managers
  • Created teams where retention is high because the problems are interesting and the culture is strong
  • Influence cross-functional teams

What the JD emphasized

  • never fail
  • 99.999% accuracy requirements
  • five-minute outage costs millions
  • zero-downtime recovery
  • scale operations sublinearly with growth
  • five nines weren't simply aspirational
  • petabyte-scale wasn't marketing but Monday
  • zero tolerance for error
  • 99.999%+ reliable systems