Performance Systems Engineer 5 - Ad Server Platform

Netflix Netflix · Big Tech · United States · Remote · Engineering

Netflix is building an in-house ad tech ecosystem and is looking for a Performance Systems Engineer to optimize their ad serving infrastructure for latency, throughput, and resource efficiency. This role involves profiling, identifying bottlenecks, designing load tests, establishing performance baselines, instrumenting telemetry, and optimizing various components of the ad serving pipeline, including the rule engine and frequency management service. The engineer will also own performance SLOs and budgets, and partner with infrastructure teams.

What you'd actually do

  1. Profile and optimize the ad serving runtime for latency, throughput, and resource efficiency across the full request lifecycle: targeting evaluation, policy enforcement, ad selection, and response serialization
  2. Identify and eliminate performance bottlenecks across services: CPU hotspots, GC pressure, memory allocation patterns, thread contention, and network overhead
  3. Design and run load tests, squeeze tests, and capacity models to validate system behavior under peak and burst traffic (including Live events at NFL scale)
  4. Establish performance baselines and regression detection: automated benchmarking in CI/CD to catch regressions before they reach production
  5. Instrument comprehensive latency telemetry, tracing, and profiling across the ad request lifecycle to enable data-driven optimization

Skills

Required

  • building and optimizing distributed systems and backend services at scale
  • performance engineering
  • profiling
  • JVM internals
  • bottleneck analysis
  • latency engineering
  • load tests
  • squeeze tests
  • capacity models
  • high-throughput, latency-sensitive systems
  • ad servers
  • SSPs
  • DSPs
  • real-time bidding infrastructure
  • Java
  • Kotlin
  • JVM languages
  • event-driven architectures
  • Kafka
  • Flink
  • stream processing
  • throughput optimization
  • consumer lag management
  • ad serving concepts
  • targeting
  • frequency capping
  • publisher controls
  • programmatic protocols

Nice to have

  • CTV constraints
  • server-side ad insertion
  • live event ad serving at scale
  • logging and telemetry frameworks
  • high-throughput request pipelines
  • Multi-region deployment
  • active-active architectures
  • failover
  • regional performance variance analysis
  • Chaos engineering
  • SRE practices
  • error budgets
  • game days
  • fault injection
  • squeeze testing
  • hardware-aware optimization
  • automated performance regression detection
  • CI/CD pipelines

What the JD emphasized

  • 7+ years building and optimizing distributed systems and backend services at scale
  • Deep experience with performance engineering
  • Strong understanding of latency engineering