Senior Software Engineer - Observability

Databricks Databricks · Data AI · Bangalore, India · Engineering - Pipeline

This role focuses on building and operating observability solutions for Databricks' large-scale data and AI infrastructure platform. The engineer will establish standards for logging, metrics, and tracing, build tooling for emitting and aggregating metrics, and ensure the scalability, performance, and reliability of these systems. The role involves optimizing platform costs and participating in incident response.

What you'd actually do

  1. Establish standards for logging, metrics, and tracing.
  2. You will collaborate with different teams to identify metrics that allow engineers to observe how well the system and different subcomponents are performing.
  3. You will build tooling and infrastructure to allow components to efficiently emit, aggregate, and store metrics that can be displayed on dashboards and used for alerting.
  4. Ensure the scalability, performance, and reliability of systems by contributing to and executing the technical roadmap.
  5. Participate in on-call rotations and reduce incident response times to maintain operational excellence.

Skills

Required

  • Python
  • Java
  • Scala
  • C++
  • large-scale distributed systems
  • metrics collection
  • health monitoring
  • observability tools

Nice to have

  • logging
  • tracing

What the JD emphasized

  • 7+ years of production-level experience