Software Engineer 5 – Agent Platform, AI Platform

Netflix Netflix · Big Tech · United States · Remote · Data & Insights

Software Engineer 5 role focused on building and operating the Agent Platform infrastructure at Netflix, including the Agent SDK, MCP Gateway, and evaluation stack. The role involves enabling other engineers to build, deploy, and run production-grade AI agents, with a strong emphasis on the full agent lifecycle (plan/act/observe, tool integration, deployment, operations) and robust evaluation mechanisms.

What you'd actually do

  1. Design, build, and operate the Agent SDK and MCP Gateway that Netflix engineers use to build, deploy, and run AI agents in production.
  2. Build agents and agent infrastructure across the full lifecycle — plan/act/observe loops, tool and MCP integrations, deployment, and day-2 operations.
  3. Make evaluation a first-class part of the platform: build the tracing, eval suites, and quality signals that let teams measure agents, catch regressions, and iterate to make them better.
  4. Own reliability, observability, and guardrails for non-deterministic systems running at very high scale
  5. Lead cross-functional initiatives with ML scientists, data scientists, product managers, and other AI Platform teams.

Skills

Required

  • 8+ years of software engineering experience
  • Hands-on experience building, deploying, operating, AND evaluating LLM agents in production
  • Experience with one or more agent frameworks/SDKs (Strands, OpenAI Agents SDK, Anthropic Claude Agent SDK, LangGraph, pydantic-ai, CrewAI, Google ADK)
  • Experience with tool/function calling and MCP
  • Experience with LLM/agent evaluation and observability — building eval suites, tracing, and quality measurement, then iterating on results (Braintrust, LangSmith, W&B, or equivalent)
  • Strong experience building SDKs and APIs for internal or external developers
  • Strong fundamentals in building and operating scalable, observable, fault-tolerant distributed systems
  • Proficiency in Python
  • Proficiency in one of Java, Go, C/C++, Rust, or Zig

Nice to have

  • Familiarity with Temporal, FastAPI, PostgreSQL, Kubernetes

What the JD emphasized

  • building, deploying, operating, AND evaluating LLM agents in production
  • evaluation and observability

Other signals

  • building foundational AI infrastructure
  • enabling other teams to build AI agents
  • operating AI systems at scale
  • focus on agent lifecycle and evaluation