Engineering Manager, AI Observability

Netflix Netflix · Big Tech · Los Gatos, CA +1 · Data & Insights

Netflix is seeking an experienced Engineering Manager to lead a newly formed AI Observability team. This role will architect, design, develop, and launch a new platform for monitoring ML and GenAI workloads, including LLMs, computer vision, and foundation models. The team will focus on making AI systems transparent, reliable, and production-ready by capturing model inputs, features, predictions, outcomes, and behavior. Key responsibilities include embedding observability by default, driving the end-to-end observability strategy, evolving LLM evaluation frameworks, defining and executing a platform roadmap, and hiring/mentoring a team.

What you'd actually do

  1. Partner with ML researchers, engineers, and platform teams to embed “observability-by-default” into new AI services, ensuring telemetry, monitoring, and evaluation are built into systems from day one.
  2. Lead the end-to-end observability strategy for AI workloads, including LLMs, generative AI systems, and classical ML models; driving build vs. buy decisions, and scaling solutions across model training, online inference, and agent orchestration
  3. Drive the evolution of LLM evaluation frameworks, covering prompt instrumentation, response quality measurement, grounding correctness, hallucination rates, and human/LLM‑as‑a‑judge scoring.
  4. Define and execute a platform roadmap focused on incremental delivery, with clear success metrics, migration goals, and strong adoption across teams.
  5. Hire, grow, and mentor a high-performing engineering team while fostering an inclusive and collaborative culture.

Skills

Required

  • 10+ years of software engineering experience
  • 3+ years of management experience
  • Experience leading teams responsible for building high-traffic distributed systems and ML infrastructure
  • Deep familiarity with AI and ML operations, including model evaluation, drift detection, and continuous monitoring at scale.
  • Experience with AI observability and monitoring tools (e.g., Arize AI, Fiddler AI, Weights & Biases, Vertex AI Model Monitoring, SageMaker Model Monitor)
  • Exposure to LLM or generative AI systems, including prompt/result logging, evaluation metrics, LLM-as-a-judge frameworks, and human-in-the-loop review
  • Strong technical acumen and can act as a credible technical advisor to the team, set and enforce a high-quality bar for code and system design, and be a mentor for the team.
  • Strong communication and collaboration skills, and the ability to build strong relationships with internal customers and external partners.
  • A demonstrated ability to develop, drive, and execute a technical vision and roadmap.
  • Experience managing a hybrid team with partners and team members distributed across (US) geographies & time zones.

What the JD emphasized

  • AI Observability
  • LLM evaluation frameworks
  • observability-by-default
  • agent orchestration
  • LLM or generative AI systems

Other signals

  • AI Observability platform
  • LLM evaluation frameworks
  • monitoring model performance, data quality, drift, latency, and failures
  • enabling ML practitioners
  • scalable, robust systems