Incident Response Manager - Product & Engineering

Anthropic Anthropic · AI Frontier · San Francisco, CA · Technical Program Management

This role is for an Incident Response Manager at Anthropic, an AI company. The manager will be responsible for building and leading the incident response function, establishing processes, tooling, and operational standards. They will serve as an on-call incident commander, manage incident communications, and partner with various teams to improve incident detection, response, and learning. The role requires experience in incident management, building incident response programs, and technical depth in cloud infrastructure.

What you'd actually do

  1. Build the incident response management function, establishing the processes, tooling, and operational standards that define how we handle incidents at scale
  2. Serve as an on-call incident commander, driving coordinated response across technical and non-technical stakeholders during incidents of varying severity, including managing multiple active incidents simultaneously
  3. Engage the right people at the right time, with a strong sense of urgency, bringing order and direction to fast-moving, ambiguous situations
  4. Own incident communications end-to-end, from real-time internal coordination to external channels like status pages, direct customer outreach, and stakeholder updates, ensuring they reflect Anthropic's commitments to safety, transparency, and accuracy
  5. Participate in blameless incident reviews, contributing operational context and helping drive follow-through on critical remediations so the same class of incident does not recur

Skills

Required

  • 5+ years of experience in incident management
  • direct experience managing technical product or infrastructure incidents
  • built or significantly shaped an incident response program
  • strong sense of ownership and urgency
  • operate independently and make sound decisions under pressure
  • comfortable working in unprecedented situations where processes are still being defined
  • track record of effective cross-functional collaboration
  • blameless, learning-oriented mindset
  • experience with cloud infrastructure incidents
  • technical depth across the stack
  • comfort navigating distributed systems, monitoring tools, and logs
  • analytically minded
  • experience using data to inform decisions
  • communicate clearly and calmly under pressure

Nice to have

  • experience at a high-growth startup

What the JD emphasized

  • write them
  • create structure rather than inherit it
  • processes are still being defined
  • guidance may be incomplete or conflicting
  • leaving things better than you found them