Lead, Incidents & Escalations, User Operations

OpenAI OpenAI · AI Frontier · San Francisco, CA · User Operations

Lead Incidents & Escalations for User Operations at OpenAI, a player-coach role focused on building and running the incident response function. This involves active participation in incidents, coordinating cross-functional teams, managing communications, and driving post-incident analysis and process improvements to enhance customer experience and reduce repeat issues. The role requires strong leadership, calm under pressure, and the ability to balance day-to-day operations with long-term system building.

What you'd actually do

  1. Participate in an on-call rotation and serve as the active incident lead during live incidents and urgent escalations.
  2. Own alert intake and triage process across support, safety, customer, and service-impacting issues.
  3. Assess severity, determine scope and impact, and initiate the appropriate response path.
  4. Page and coordinate Engineering, Infrastructure, Support Delivery, Product, Legal, Policy, Go-To-Market, and other teams as needed.
  5. Lead incident response calls, manage timelines, clarify roles, and keep responders focused and unblocked.

Skills

Required

  • Incident management
  • Technical support
  • Escalation management
  • SRE
  • Technical program management
  • Production operations
  • On-call experience
  • Leadership
  • Incident Commander experience
  • Customer-impacting incident handling
  • Executive escalation handling
  • Safety-sensitive escalation handling
  • High-severity technical issue handling
  • Communication under pressure
  • Incident communications (internal, executive, customer, status pages)
  • Incident management tools (incident.io, PagerDuty, Datadog, Jira, Salesforce, Zendesk)
  • Monitoring and observability
  • Post-incident retrospectives
  • Root cause analysis
  • Corrective action tracking
  • Process improvement
  • Systems building
  • Calmness under pressure
  • Structured approach in ambiguity

Nice to have

  • AI and its applications

What the JD emphasized

  • 10+ years of experience in incident management, technical support, escalation management, SRE, technical program management, or production operations
  • 5+ years of hands-on experience working in production, on-call, or high-urgency operational environments
  • 5+ years of leadership experience, ideally in a Support, Engineering, or similar environment
  • direct experience with customer-impacting incidents, executive escalations, safety-sensitive escalations, or high-severity technical issues
  • hands-on experience with incident communications, including internal updates, executive briefings, customer updates, and status pages