Program Manager, Trust & Safety, Prime Video, Prime Video Trust & Safety

Amazon Amazon · Big Tech · Seattle, WA · Corporate Operations

This role owns the end-to-end prompt lifecycle for content classification automation systems using LLMs. It involves translating policy into precise prompts, rigorous testing, and deploying changes to production, balancing precision and recall to optimize classification quality. The role requires influencing cross-functional teams and managing program governance for prompt changes.

What you'd actually do

  1. Own end-to-end prompt engineering for content classification automation, including drafting, iterating, A/B testing, and deploying prompt changes to production environments.
  2. Develop deep expertise in maturity rating and content descriptor SOPs. Identify critical decision points, edge cases, and nuances that must be translated into prompt logic to ensure automated outputs mirror or is better than human judgment.
  3. Design and execute rigorous testing frameworks to validate prompt outputs against ground truth. Analyze false positives and false negatives, and make high-judgment calls on acceptable thresholds.
  4. Make strategic decisions on where to optimize for precision (minimizing false positives) versus recall (minimizing missed content), balancing customer trust, regulatory requirements, and operational cost.
  5. Drive alignment and decision-making across Science, Engineering, Operations, Legal, and Policy teams without direct reporting relationships. Build consensus on prompt strategies, escalation paths, and deployment timelines.

Skills

Required

  • 3+ years of compliance program management, legal, governance, audit, risk/loss prevention, or equivalent experience
  • Experience managing competing priorities and using metrics to drive business decisions
  • Experience communicating clearly and concisely with leadership, stakeholders, and cross-functional teams
  • Bachelor's degree or 3+ years of equivalent professional experience managing programs or in a technology, content moderation, or AI/ML-adjacent environment

Nice to have

  • Experience designing or architecting (design patterns, reliability and scaling) of new and existing systems, or experience in machine learning, data mining, information retrieval, statistics or natural language processing
  • Experience in trust and safety compliance or risk
  • Experience in a test-driven and formal QA development environment, including development, staging, production (or equivalent) deployment cycles
  • Experience driving improvement programs in the operations, engineering and support fields
  • Experience with statistical methods (e.g., A/B Testing, Regression)

What the JD emphasized

  • rigorously test outputs
  • high-judgment role
  • strategic tradeoff decisions
  • precision and recall
  • influence without authority
  • prompt lifecycle ownership
  • testing and validation
  • precision vs. recall tradeoffs
  • acceptable thresholds
  • balancing customer trust, regulatory requirements, and operational cost
  • influence without authority
  • escalation paths
  • rollback procedures
  • audit trails
  • partner with Science teams on model behavior
  • Engineering on deployment infrastructure
  • Operations on ground truth labeling
  • Policy on evolving content standards
  • documentation of prompt logic, testing results, decision rationale, and SOP-to-prompt mapping

Other signals

  • LLM prompt lifecycle ownership
  • translating policy to prompts
  • testing and validation of LLM outputs
  • optimizing precision vs recall for LLM classification