Incident Manager

Cockroach Labs Cockroach Labs · Data AI · United States · Remote · Customer Success

This role is for an Incident Manager at Cockroach Labs, responsible for coordinating and resolving incidents across internal systems, cloud services, and customer environments. The role involves leading response efforts, conducting root cause analysis, and improving incident management processes. While the role mentions using AI-assisted tools, its core function is incident management, not AI/ML development.

What you'd actually do

  1. Manage the full lifecycle of incidents from detection through resolution, ensuring adherence to established incident management processes.
  2. Lead and coordinate cross-functional response efforts to drive timely and effective incident resolution.
  3. Declare and escalate high-severity incidents, mobilizing appropriate stakeholders and leadership as needed.
  4. Serve as an escalation point for critical incidents and support crisis response activities.
  5. Lead structured root cause analysis and post-incident reviews, ensuring actionable follow-up items are identified.

Skills

Required

  • 5+ years of experience in technical operations, SRE, support, or incident management roles
  • at least 2 years of direct Incident Management experience leading high-severity incidents
  • Prior experience working in a highly technical, fast-paced environment such as a cloud infrastructure, SaaS, or enterprise software company
  • Working knowledge of AI-assisted tools and the ability to apply them effectively to incident analysis, documentation, and process improvement
  • Strong troubleshooting and analytical skills in a 24x7 operational environment
  • Excellent written and verbal communication skills across technical and non-technical audiences
  • Ability to remain calm and structured during high-pressure situations
  • Proven ability to assume command during high-severity incidents, bringing structure, clarity, and decisive direction in fast-moving, ambiguous situations
  • Bachelor’s degree in Computer Science, Information Technology, or equivalent experience

Nice to have

  • Experience leading incident response calls and driving cross-team coordination
  • Strong influencing skills when working across teams without direct authority
  • Familiarity with IT service management principles (ITIL, Incident, Change, Problem Management)
  • Experience with incident management tooling
  • Exposure to security or compliance-related incident response
  • Basic scripting skills (Bash, Python, JavaScript) to support operational improvements
  • Relevant technical or ITIL certifications

What the JD emphasized

  • direct Incident Management experience leading high-severity incidents
  • highly technical, fast-paced environment
  • Working knowledge of AI-assisted tools and the ability to apply them effectively to incident analysis, documentation, and process improvement.
  • strong troubleshooting and analytical skills
  • calm and structured during high-pressure situations
  • assume command during high-severity incidents