Staff Technical Program Manager - Reliability Engineering

Robinhood Robinhood · Fintech · Menlo Park, CA · ENG Technical Assurance

Staff Technical Program Manager for Reliability Engineering at Robinhood, focusing on leading incident management programs, defining standards, and improving system reliability and operational risk. The role involves technical risk assessment, infrastructure migration projects, and partnering with engineering teams to establish reliability standards like SLOs.

What you'd actually do

  1. Lead incident management programs, including response processes, escalation paths, communication standards, and post-incident tracking
  2. Define and track follow-up actions after incidents, ensuring completion and measurable reduction of repeat issues
  3. Run a technical risk assessment program for the organization to ensure top technical risks are identified and appropriate controls are built.
  4. Run Complex infrastructure related migration projects to build a solid pre-production testing discipline.
  5. Provide clear updates on program status, risks, and system health metrics to engineering and product leadership

Skills

Required

  • 7+ years of experience leading technical programs related to infrastructure, reliability, or incident management in large-scale systems
  • Ability to understand system architecture, evaluate technical tradeoffs, and work closely with senior engineers on design and execution
  • Experience building incident management or reliability processes that improved measurable outcomes such as uptime or response time
  • Strong written and verbal communication skills, including the ability to coordinate effectively during high-pressure situations
  • Proven ability to organize complex work, manage dependencies, and deliver results across multiple engineering teams

What the JD emphasized

  • improve system uptime
  • response times
  • operational clarity
  • measurable reliability improvements
  • measurable outcomes