Staff Software Engineer, Model Serving

Databricks Databricks · Data AI · Mountain View, CA · Engineering

Databricks is seeking a Staff Software Engineer to work on their Model Serving product, which is a core pillar of their platform for enterprises to deploy and manage AI/ML models. The role involves designing and building systems for high-throughput, low-latency inference across CPU and GPU workloads, influencing architectural direction, and collaborating with various teams to deliver a world-class serving platform.

What you'd actually do

  1. Design and implement core systems and APIs that power Databricks Model Serving, ensuring scalability, reliability, and operational excellence.
  2. Partner with product and engineering leadership to define the technical roadmap and long-term architecture for serving workloads.
  3. Drive architectural decisions and trade-offs to optimize performance, throughput, autoscaling, and operational efficiency for CPU and GPU serving workloads.
  4. Contribute directly to key components across the serving infrastructure — from model container builds and deployment workflows to runtime systems like routing, caching, observability, and intelligent autoscaling — ensuring smooth and efficient operations at scale.
  5. Collaborate cross-functionally with product, platform, and research teams to translate customer needs into reliable and performant systems.

Skills

Required

  • building and operating large-scale distributed systems
  • model serving
  • inference systems
  • related infrastructure (e.g., routing, scheduling, autoscaling, and observability)
  • algorithms
  • data structures
  • system design
  • large-scale, low-latency serving systems
  • delivering technically complex, high-impact initiatives
  • leading architecture for large-scale, performance-sensitive CPU/GPU inference systems
  • communication skills
  • collaboration across teams

Nice to have

  • mentoring, growing engineers, and fostering technical excellence
  • strategic and product-oriented mindset

What the JD emphasized

  • 10+ years of experience building and operating large-scale distributed systems
  • Deep expertise in model serving, inference systems, and related infrastructure
  • Experience leading architecture for large-scale, performance-sensitive CPU/GPU inference systems

Other signals

  • Model Serving product provides enterprises with a unified, scalable, and governed platform to deploy and manage AI/ML models
  • offers real-time, low-latency inference, governance, monitoring, and lineage
  • operationalize models at scale with strong SLAs and cost efficiency
  • design and build systems that enable high-throughput, low-latency inference across CPU and GPU workloads
  • deliver a world-class serving platform