Staff Backline Engineer - Platform

Databricks Databricks · Data AI · Mountain View, CA +1 · Support

Databricks is seeking a Staff Backline Engineer to join their Platform team. This role acts as a critical bridge between Frontline Support and Engineering, handling complex technical issues and escalations within the Data and AI ecosystem. The engineer will focus on deep analysis, architecting troubleshooting methodologies, championing product supportability and observability, spearheading automation initiatives, and serving as a technical liaison between Support and Engineering. The role requires deep mastery in areas such as Application Debugging, Big Data Ecosystems, Database Management, Automation & Tooling, Infrastructure & Networking, or Cloud Skills, with a strong emphasis on diagnostic skills and experience in complex systems.

What you'd actually do

  1. Act as the ultimate technical escalation point, driving the resolution of the most complex, high-stakes customer issues on the Databricks Platform through deep analysis of core components, metrics, and logs.
  2. Elevate global team competency by architecting standardized troubleshooting methodologies, comprehensive runbooks, and leading deep-dive technical enablement sessions.
  3. Champion product supportability and observability, by actively partnering with Engineering and Product teams to influence the roadmap and integrate critical feedback.
  4. Spearhead strategic automation and tooling initiatives, driving programs that significantly reduce manual toil and improve global MTTR (Mean Time to Resolution) metrics.
  5. Act as the primary technical liaison between Support and Engineering, driving the enablement strategy for upcoming product features, architectures, and releases.

Skills

Required

  • Troubleshooting complex Java, Scala, or Python-based applications
  • Managing, tuning, and resolving issues within distributed computing environments (Apache Spark, Hadoop)
  • Debugging, tuning, and resolving issues in SQL-based relational databases (Postgres)
  • Building automation frameworks and system scripts using Python or bash/shell
  • Linux system administration, kernel-level troubleshooting, and complex network diagnostics (TCP/IP Stack, DNS, routing, packet capture analysis)
  • Debugging, optimizing, and supporting highly scalable workloads across major cloud platforms (AWS, Azure, or GCP)
  • Pinpointing root causes in highly complex systems through advanced black-box and system-level troubleshooting
  • 12+ years of highly technical industry experience in technical support, sustaining engineering or similar roles

Nice to have

  • Customer success
  • proactive issue resolution
  • continuous platform improvements
  • automation and tooling
  • troubleshooting efficiency
  • reduce manual efforts
  • improve the overall supportability of the platform
  • drive operational excellence
  • ensure a delightful experience

What the JD emphasized

  • deep mastery of one of the following and related areas
  • Expert-level diagnostic skills
  • highly technical industry experience