Sr. Staff Software Engineer, Observability

Databricks Databricks · Data AI · Mountain View, CA · Engineering

Databricks is hiring a Sr. Staff Software Engineer for their Observability team. This role will build and manage next-generation observability platforms for Databricks' large-scale data and AI infrastructure, processing petabytes of logs and billions of time series daily. The engineer will develop advanced workflows for incident diagnosis, uplevel monitoring practices, and mentor other engineers. Requires 15+ years of experience in languages like Go, Python, Scala, Rust, C++, experience with large-scale distributed systems, cloud technologies, and familiarity with observability infrastructure.

What you'd actually do

  1. You will build the next generation of observability platforms that support billions of active time series and process petabytes of logs daily.
  2. You will manage infrastructure across nearly a hundred cloud regions, enabling all Databricks engineers and customers to monitor the reliability of our product.
  3. You will develop advanced workflows that accelerate incident diagnosis for Bricksters, allowing engineers to quickly derive insights from logs and metrics.
  4. You will uplevel monitoring and reliability practices across Databricks engineering, developing opinionated tools that set common standards for managing structured logs, metrics, alerts, dashboards, and oncall rotations.
  5. Mentor and uplevel engineers, fostering a culture of technical excellence within the team and broader observability community.

Skills

Required

  • Go
  • Python
  • Java
  • Scala
  • Rust
  • C++
  • large-scale distributed systems
  • cloud technologies (AWS, Azure, GCP)
  • Docker
  • Kubernetes
  • observability infrastructure
  • monitoring patterns
  • reliability practices

What the JD emphasized

  • 15+ years of production-level experience
  • large-scale distributed systems
  • observability infrastructure