Staff Backend Software Engineer- (ai Platform)

Databricks Databricks · Data AI · Mountain View, CA · Engineering

Databricks is seeking a Staff Backend Software Engineer for their AI Platform team, focusing on the Model Serving product. The role involves designing and building systems for high-throughput, low-latency inference across CPU and GPU workloads, optimizing performance, and ensuring scalability and reliability. The engineer will contribute to core serving infrastructure, collaborate cross-functionally, and lead technical initiatives to improve latency, availability, and cost-effectiveness.

What you'd actually do

  1. Design and implement core systems and APIs that power Databricks Model Serving, ensuring scalability, reliability, and operational excellence.
  2. Drive architectural decisions and trade-offs to optimize performance, throughput, autoscaling, and operational efficiency for CPU and GPU serving workloads.
  3. Contribute directly to key components across the serving infrastructure — from model container builds and deployment workflows to runtime systems like routing, caching, observability, and intelligent autoscaling — ensuring smooth and efficient operations at scale.
  4. Collaborate cross-functionally with product, platform, and research teams to translate customer needs into reliable and performant systems.
  5. Lead technical initiatives that improve latency, availability, and cost-effectiveness across both customer-facing and foundational serving layers.

Skills

Required

  • 5+ years of experience building and operating large-scale distributed systems.
  • Experience in model serving, inference systems, or related infrastructure (e.g., routing, scheduling, autoscaling, and observability).
  • Strong foundation in algorithms, data structures, and system design as applied to large-scale, low-latency serving systems.
  • Proven ability to deliver technically complex, high-impact initiatives that create measurable customer or business value.
  • Experience building architecture for large-scale, performance-sensitive CPU/GPU inference systems.
  • Strong communication skills and ability to collaborate across teams in fast-moving environments.
  • Customer-focused mindset with the ability to align implementation details with product goals.

Nice to have

  • Passion for mentoring, growing engineers, and fostering technical excellence.

What the JD emphasized

  • high-throughput, low-latency inference across CPU and GPU workloads
  • performance, throughput, autoscaling, and operational efficiency for CPU and GPU serving workloads
  • runtime systems like routing, caching, observability, and intelligent autoscaling
  • improve latency, availability, and cost-effectiveness

Other signals

  • Model Serving product provides enterprises with a unified, scalable, and governed platform to deploy and manage AI/ML models
  • high-throughput, low-latency inference across CPU and GPU workloads
  • deliver a world-class serving platform