Engineering Manager, Ads ML Efficiency

Reddit Reddit · Consumer · United States · Remote · Ads Engineering

Engineering Manager for Ads ML Efficiency at Reddit, leading a team focused on making model training and inference faster, cheaper, safer, and more scalable. The role involves defining roadmaps for optimization, building systems and tooling for performance, and partnering with other teams to accelerate launches and reduce bottlenecks. Requires deep ML engineering experience, hands-on optimization background, strong managerial ability, and distributed systems fluency.

What you'd actually do

  1. Lead & Grow: Hire, mentor, and retain a high-performing team of ML engineers / systems-oriented engineers working on model optimization and ML efficiency.
  2. Set Technical Direction: Define the roadmap for training optimization, inference optimization, launch-readiness tooling, and reusable efficiency primitives across Ads ML.
  3. Deliver Measurable Wins: Drive reductions in model training time, online latency, serving cost, and infra-driven launch risk.
  4. Build Systems and Tooling: Guide the development of profiling, benchmarking, load testing, observability, cost analysis, debugging, and efficiency certification systems.
  5. Operate in the Critical Path: Partner with model owners and platform teams to accelerate high-priority launches and remove bottlenecks from the path to production.

Skills

Required

  • ML engineering
  • systems optimization
  • organizational leverage
  • model optimization
  • training efficiency
  • inference optimization
  • GPU enablement
  • load testing
  • model performance tooling
  • efficiency guardrails
  • hiring
  • mentoring
  • team leadership
  • roadmap definition
  • profiling
  • benchmarking
  • observability
  • cost analysis
  • debugging
  • distributed systems
  • production-scale ML systems
  • reliability
  • speed
  • cost
  • scale
  • service provider mindset
  • building reusable systems
  • technical communication

Nice to have

  • Ads experience
  • ads ranking
  • recommender systems
  • marketplace ML
  • GPU training and serving migrations
  • PyTorch
  • distributed training frameworks
  • kernel optimization
  • performance optimization
  • efficiency benchmarking frameworks
  • launch certification frameworks
  • ML platform
  • applied modeling

What the JD emphasized

  • model optimization
  • training efficiency
  • inference optimization
  • efficiency guardrails
  • model training time
  • online latency
  • serving cost
  • launch risk
  • profiling
  • benchmarking
  • load testing
  • observability
  • cost analysis
  • debugging
  • efficiency certification systems
  • optimization
  • performance debugging
  • launch safety
  • Deep ML Engineering Experience
  • Hands-on Optimization Background
  • Distributed Systems Fluency

Other signals

  • ML efficiency
  • model optimization
  • training efficiency
  • inference optimization
  • GPU enablement
  • performance tooling
  • efficiency guardrails