Manager, Product Management - Prometheus Financial Core (pfc)

Capital One Capital One · Banking · McLean, VA +1

Product Manager responsible for platform reliability strategy, defining availability standards (SLOs/Error Budgets), driving engineering partnerships for resilience, and managing stakeholder/risk for a cloud-native bank's infrastructure.

What you'd actually do

  1. Lead the Platform Reliability Strategy: You will define the roadmap for system-wide fault isolation and blast radius reduction. You leverage cell-based architecture to ensure that localized service disruptions do not impact the broader customer base or compromise platform integrity.
  2. Define Availability Standards: Support the definition and implementation of Service Level Objectives (SLOs) and Error Budgets. You will work with engineering partners to ensure these aren't just metrics on a dashboard, but actionable drivers of the product development lifecycle.
  3. Drive Engineering Partnership: Serve as the bridge between business intent and technical implementation. You will translate complex resilience needs into technical specifications, partnering with engineers on system design reviews and circuit-breaking logic.
  4. Stakeholder & Risk Management: Partner across organizational boundaries with Risk, Cyber, and Line of Business (LOB) stakeholders to align on uptime requirements and mitigation strategies. You will use data-driven storytelling to communicate platform health and the business impact of resilience investments.
  5. Problem Solving & Insights: Leverage data to identify potential single points of failure and advocate for architectural improvements. You will stay ahead of the curve on industry trends in cloud reliability to ensure Capital One remains a leader in platform stability.

Skills

Required

  • Product Management
  • Platform Reliability Strategy
  • Availability and Resilience
  • System Design
  • Stakeholder Management
  • Risk Management
  • Cloud Infrastructure
  • Technical Fluency
  • Back-end Complexities
  • Quantitative Field Degree

Nice to have

  • Consumer Facing Digital Products

What the JD emphasized

  • system-wide fault isolation
  • blast radius reduction
  • cell-based architecture
  • Service Level Objectives (SLOs)
  • Error Budgets
  • resilience needs
  • platform health
  • resilience investments
  • single points of failure
  • platform stability