Senior Applied AI Engineer – ML for Systems & Infrastructure

Databricks Databricks · Data AI · San Francisco, CA · Engineering - Pipeline

Senior Applied AI Engineer focused on applying ML to improve Databricks' engineering systems and infrastructure, including cluster management and query compilation. The role involves building end-to-end systems, deploying models at scale, and architecting ML infrastructure for production environments.

What you'd actually do

  1. Build end-to-end systems from the ground up in a small team of experienced people.
  2. Shape the direction of our applied ML areas of investment by engaging with engineering and product teams across the company.
  3. Drive the development and deployment of state-of-the-art AI models and systems that directly impact the capabilities and performance of Databricks' products, infrastructure and services.
  4. Architect and implement robust, scalable ML infrastructure, including data storage and processing, model training and serving components, and monitoring and reporting systems to support seamless integration of AI/ML models into production environments.
  5. Work on novel modeling techniques in the field of ML for Systems

Skills

Required

  • Machine learning engineering experience
  • Computer systems understanding
  • Statistics understanding
  • Mathematical modeling
  • Developing AI/ML systems at scale in production
  • ML modeling beyond standard libraries
  • Coding and software engineering skills
  • Software engineering principles (testing, code reviews, deployment)
  • Deploying, scaling and monitoring models in production
  • Infrastructure challenges for training and serving in Tier 0 environments

Nice to have

  • Optimization algorithms
  • Combinatorial optimization
  • Publishing research
  • Presenting at conferences
  • Participating in open-source projects

What the JD emphasized

  • 2-8 years of machine learning engineering experience
  • Experience developing AI/ML systems at scale in production
  • Strong track record of ML modeling that goes beyond using standard libraries
  • Experience deploying, scaling and monitoring models in production

Other signals

  • Build end-to-end systems from the ground up
  • Drive the development and deployment of state-of-the-art AI models and systems
  • Architect and implement robust, scalable ML infrastructure
  • Experience deploying, scaling and monitoring models in production