Team Lead, Site Reliability Engineering - Storage Layer Service

MongoDB MongoDB · Enterprise · New York, NY · PTO Site Reliability Engineering

Lead SRE for MongoDB's Storage Layer Services (SLS) team, focusing on re-architecting and ensuring the reliability, durability, and operational safety of the cloud storage layer for Atlas. This role involves leading a team of SREs, defining technical vision and roadmaps, and hands-on technical contributions in a distributed systems environment.

What you'd actually do

  1. Build and lead a team of 6-8 engineers, fostering a positive culture, handling career growth and performance conversations, and proactively removing blockers
  2. Define and drive a clear technical vision and comprehensive roadmap for our multi-tenant distributed storage systems, balancing long-term strategic infrastructure goals with immediate engineering needs
  3. Contribute through hands-on technical work, such as leading architectural design reviews, reviewing PRs, and stepping in to guide the team through complex operational challenges
  4. Act as the primary liaison for the Storage Layer Services SRE team, collaborating closely with other engineering leaders to ensure platform alignment and manage stakeholder expectations

Skills

Required

  • software engineering
  • distributed systems
  • team leadership
  • technical vision
  • roadmap definition
  • architectural design
  • Kubernetes
  • containerization
  • IaC tooling (Terraform, Crossplane, Operators)
  • stateful storage systems
  • database systems
  • durability
  • consistency
  • recovery trade-offs
  • communication skills

Nice to have

  • leading major architectural shifts
  • multi-cloud environments (AWS, GCP, Azure)
  • designing secure, multi-tenant runtime environments

What the JD emphasized

  • 10+ years of experience working on software and operating distributed systems
  • 2+ years managing engineering teams
  • deep technical familiarity with Kubernetes ecosystems
  • operated or supported stateful storage or database systems at scale