Staff Backend Software Engineer- (ai Platform)

Databricks Databricks · Data AI · Mountain View, CA · Engineering

Staff Backend Software Engineer for Databricks' AI Platform, focusing on Foundation Model Serving. The role involves designing and implementing high-throughput, low-latency inference systems for frontier AI models on GPU workloads, optimizing serving infrastructure, and influencing the technical roadmap for LLM APIs and runtimes at scale. Prior ML/AI experience is not required, but experience with large-scale distributed systems and operational sensitive systems is critical.

What you'd actually do

  1. Design and implement core systems and APIs that power Databricks Foundation Model Serving, ensuring scalability, reliability, and operational excellence.
  2. Partner with product and engineering leadership to define the technical roadmap and long-term architecture for serving workloads.
  3. Drive architectural decisions and trade-offs to optimize performance, throughput, autoscaling, and operational efficiency for GPU serving workloads.
  4. Contribute directly to key components across the serving infrastructure — from working in systems like vLLM and SGLang to creating token based rate limiters and optimizers — ensuring smooth and efficient operations at scale.
  5. Collaborate cross-functionally with product, platform, and research teams to translate customer needs into reliable and performant systems.

Skills

Required

  • 10+ years of experience building and operating large-scale distributed systems.
  • Experience leading high-scale operationally sensitive backend systems.
  • Strong foundation in algorithms, data structures, and system design as applied to large-scale, low-latency serving systems.
  • Proven ability to deliver technically complex, high-impact initiatives that create measurable customer or business value.
  • Strong communication skills and ability to collaborate across teams in fast-moving environments.
  • Strategic and product-oriented mindset with the ability to align technical execution with long-term vision.

Nice to have

  • interest in getting deep building LLM APIs and runtimes at scale
  • Passion for mentoring, growing engineers, and fostering technical excellence.

What the JD emphasized

  • high scale operational sensitive systems
  • high-throughput, low-latency inference
  • GPU workloads
  • frontier models
  • LLM APIs and runtimes at scale
  • large-scale distributed systems
  • high-scale operationally sensitive backend systems
  • low-latency serving systems

Other signals

  • foundation model serving
  • high-throughput, low-latency inference
  • GPU workloads
  • frontier models
  • LLM APIs and runtimes at scale