Technical Program Manager, Site Reliability Engineering (senior or Staff)

MongoDB MongoDB · Enterprise · New York, NY · PTO Site Reliability Engineering

Technical Program Manager for Site Reliability Engineering (SRE) at MongoDB, focusing on scaling the platform that underpins all cloud products. The role involves driving program execution, strengthening production reliability practices, and coordinating cross-functional efforts to ensure smoother launches, clearer roadmaps, and improved reliability metrics.

What you'd actually do

  1. Drive Program Planning & Execution – Define program scope, milestones, and success criteria with SRE engineers and leaders. Manage dependencies across platform teams, keep work clearly tracked in Jira, and deliver on time
  2. Strengthen Production Reliability – Lead change management and launch readiness programs. Partner with SREs and product teams to define and operationalize SLOs/SLIs, and use incident data, metrics, and capacity signals to drive prioritization and continuous improvement
  3. Lead Cross-Functional Coordination – Align SRE with Security, Compliance, Cloud platform, and other engineering teams. Coordinate cross-team incident response, ensure clear follow-through, and build trust as the go-to driver of complex, multi-team efforts
  4. Build Scalable Systems & Processes – Design lightweight frameworks and communication patterns that help SRE deliver reliably at scale. Work yourself out of the "hero" role by leaving teams better-equipped to execute independently

Skills

Required

  • 8+ years in technical program management, engineering management, or a comparable technical role partnering with software engineering teams
  • Proven track record leading large-scale, cross-team platform initiatives through ambiguity and change
  • Strong knowledge of production change management, software development lifecycle, and reliability metrics (SLOs, SLIs)
  • Skilled at shaping roadmaps and managing dependencies
  • Able to query and interpret metrics, logs, or other data sources to inform decisions and communicate risk
  • Excellent communicator—clear, concise, and calm—across engineers, cross-functional partners, and executives
  • Low-ego, highly collaborative, and motivated by ownership of hard problems end to end

Nice to have

  • Hands-on or close-partner experience with Kubernetes, cloud networking, or observability stacks (metrics, logs, tracing, alerting)
  • Prior experience working with or alongside SRE teams
  • Background in large-scale cloud infrastructure or platform engineering
  • Familiarity with MongoDB Atlas or other modern cloud database platforms

What the JD emphasized

  • scale the platform
  • production reliability
  • cross-functional efforts
  • predictability at scale
  • large-scale, cross-team platform initiatives through ambiguity and change
  • production change management
  • reliability metrics (SLOs, SLIs)
  • shaping roadmaps
  • managing dependencies
  • metrics, logs, or other data sources
  • hard problems end to end