Staff Software Engineer - Ai/ml

Databricks Databricks · Data AI · New York, NY · Engineering

Staff Software Engineer focused on building and evaluating ML/LLM systems for a new Customer Data Platform on Databricks. The role involves developing personalization use cases, improving model behavior in production, building evaluation frameworks, and setting technical foundations for ML/AI personalization work in a 0-to-1 environment.

What you'd actually do

  1. Evaluate ML and LLM approaches for CustomerLake's personalization use cases, push the models and algorithms forward, and continuously improve quality over time
  2. Go deep on how models behave in production: inspect individual traces, understand how the models reason, and tune and improve from there
  3. Build the platform and evaluation framework that let CustomerLake customers optimize for real business value such as purchases, retention, and product usage, not vanity metrics like email opens and clicks
  4. Push the team toward new directions and novel methods worth tackling, not just optimizing what already exists
  5. Partner closely with product management, engineering, and design to turn ambiguous customer problems into scalable, trustworthy solutions

Skills

Required

  • Python
  • modern ML frameworks (e.g., PyTorch)
  • model evaluation
  • monitoring AI quality in production
  • LLMs
  • generative AI
  • retrieval-augmented generation (RAG)
  • prompt design
  • fine-tuning
  • evaluation
  • product mindset
  • 0-to-1 environments

Nice to have

  • martech
  • go-to-market or business use case with an analytical angle
  • academic or research background
  • innovate and develop novel methods

What the JD emphasized

  • 10+ years of engineering experience
  • shipping and improving ML/AI products
  • building and evaluating ML models and/or LLM systems for real product or business use cases
  • practical, not purely academic
  • make models work well inside a product
  • customer behavior
  • model evaluation and monitoring AI quality in production
  • LLMs and generative AI
  • retrieval-augmented generation (RAG)
  • prompt design
  • fine-tuning
  • evaluation
  • demonstrated product mindset
  • 0-to-1 environments
  • making pragmatic trade-offs
  • operating with incomplete information
  • driving projects from idea through launch and adoption

Other signals

  • building new products from the ground up
  • enterprise-grade ML and AI personalization
  • 0-to-1 environment
  • customer data platform