Manager, Site Reliability Engineering - Storage Layer Service

MongoDB MongoDB · Enterprise · New York, NY · PTO Site Reliability Engineering

Manager for Site Reliability Engineering focusing on MongoDB's Storage Layer Services (SLS). The role involves leading a team of SREs to ensure the reliability, durability, and operational safety of the storage layer that underpins MongoDB Atlas. Responsibilities include defining SLOs, shaping capacity plans, and contributing to architectural design reviews and operational challenges. Requires experience in distributed systems, Kubernetes, IaC, and operating stateful storage or database systems at scale.

What you'd actually do

  1. Build and lead a team of 6-8 engineers, fostering a positive culture, handling career growth and performance conversations, and proactively removing blockers
  2. Define and drive a clear technical vision and comprehensive roadmap for our multi-tenant distributed storage systems, balancing long-term strategic infrastructure goals with immediate engineering needs
  3. Contribute through hands-on technical work, such as leading architectural design reviews, reviewing PRs, and stepping in to guide the team through complex operational challenges
  4. Act as the primary liaison for the Storage Layer Services SRE team, collaborating closely with other engineering leaders to ensure platform alignment and manage stakeholder expectations

Skills

Required

  • software engineering
  • distributed systems
  • site reliability engineering
  • team leadership
  • Kubernetes
  • containerization
  • Infrastructure as Code (IaC)
  • Terraform
  • Crossplane
  • Operators
  • stateful storage systems
  • database systems
  • durability
  • consistency
  • recovery trade-offs
  • technical roadmaps
  • communication skills

Nice to have

  • multi-cloud environments (AWS, GCP, or Azure)
  • secure, multi-tenant runtime environments

What the JD emphasized

  • multi-year roadmap
  • 10+ years of experience working on software and operating distributed systems
  • 2+ years managing engineering teams
  • deep technical familiarity with Kubernetes ecosystems
  • operated or supported stateful storage or database systems at scale
  • Leading major architectural shifts