Machine Learning Engineer 5 - Decisioning & Optimization

Netflix Netflix · Big Tech · New York, NY +4 · Data & Insights

Netflix is seeking an ML Engineer to build and operate real-time ML model serving infrastructure for their ad tech ecosystem. The role focuses on scaling inference paths to support high QPS with strict latency budgets, optimizing feature serving, productionizing scoring and ranking models, and building model performance monitoring. Experience with high-QPS, low-latency real-time model serving systems and operating at scale is critical.

What you'd actually do

  1. Build and operate end-to-end ML model serving infrastructure for real-time ad decisioning: model publishing, packaging, validation, deployment into the serving stack with zero-downtime hot-swap
  2. Scale the inference path to support dozens of concurrent models on every ad request at 1M+ QPS with strict latency budgets, including batching strategies, CPU/GPU allocation, model versioning, and fallback tiers
  3. Design and optimize the feature serving path: feature hydration from Chronon, Signal Service, and real-time streams with sub-10ms P99 fetch latency and online/offline consistency
  4. Productionize scoring and ranking models for multi-stage ad selection (retrieval, early ranking, full scoring) and integrate model outputs into auction
  5. Build model performance monitoring in production: inference latency, prediction distribution shifts, feature drift detection, score calibration, and regression detection before revenue impact

Skills

Required

  • 7+ years of software engineering experience
  • 3+ years focused on ML infrastructure, model serving, or ML platform work in an ads or real-time decisioning context
  • Built and operated real-time model serving systems at high QPS with sub-20ms latency: online inference, feature stores, model registries, model hot-swap, canary and shadow rollout
  • Proficiency in Java, Python, or Scala with a solid understanding of multi-threading, memory management, and performance optimization for latency-critical paths
  • Hands-on with ML serving frameworks: serialization, runtime optimization, and deployment constraints
  • Experience with feature engineering pipelines for real-time systems: online/offline consistency, hydration strategies, caching, and freshness tradeoffs
  • Strong understanding of model monitoring in production: drift detection, prediction distribution analysis, calibration, and latency profiling
  • Demonstrated ability to operate in an environment that requires both big-tech scale and startup speed

Nice to have

  • Ads domain experience: ranking models, bid scoring, reserve pricing, yield optimization, dynamic allocation across guaranteed and non-guaranteed inventory
  • Experience with auction mechanics: multi-stage ranking, bid shading, bid prediction, marketplace competition dynamics
  • Built or improved budget pacing and delivery control systems
  • Built simulation or counterfactual testing platforms for marketplace or auction systems
  • Experience with A/B testing infrastructure for model rollouts: online experiments, holdout groups, interference-aware evaluation in marketplace settings
  • Familiar with CTV constraints: server-side ad insertion, live event ad serving at scale, burst traffic patterns
  • JVM ecosystem

What the JD emphasized

  • real-time ad decisioning
  • 1M+ QPS
  • sub-20ms P99 inference budgets
  • real-time model serving systems at high QPS with sub-20ms latency
  • latency-critical paths

Other signals

  • ML infrastructure for model serving
  • real-time inference at 1M+ QPS
  • multi-model parallel evaluation
  • model lifecycle from canary deployment through production monitoring
  • Auction, ranking, and scoring
  • multi-stage candidate selection
  • scoring
  • bid valuation
  • dynamic pricing
  • podding
  • Budget, pacing, and bidding
  • control systems for delivery optimization
  • budget planning
  • bid computation
  • scaling from a handful of production models to 10+
  • sub-20ms P99 inference budgets
  • build and operate the serving infrastructure these models run on