Team Lead, Site Reliability Engineering - Fleet Management

MongoDB MongoDB · Enterprise · Austin, TX +7 · PTO Site Reliability Engineering

Team Lead for Site Reliability Engineering focusing on Fleet Management, responsible for the Kubernetes runtime environment, infrastructure, and operational functions. The role involves managing a team, developing technical vision, and contributing to architectural design, with a focus on automation and migrating from IaC to an Operator-driven model.

What you'd actually do

  1. Manage a team of 6-8 engineers, fostering a positive culture, handling career growth and performance conversations, and proactively removing blockers
  2. Help develop a clear technical vision and comprehensive roadmap for our runtime environment, balancing long-term strategic infrastructure goals with immediate engineering needs
  3. Contribute through light hands-on technical work, such as leading architectural design reviews, reviewing PRs, and stepping in to guide the team through complex operational challenges
  4. Act as the primary liaison for the Fleet Management team, collaborating closely with other engineering leaders to ensure platform alignment and manage stakeholder expectations

Skills

Required

  • Software engineering
  • Distributed systems
  • Team management
  • Technical vision and roadmap development
  • Kubernetes
  • Containerization
  • Infrastructure as Code (Terraform, Crossplane, Operators)
  • Automation
  • Cloud environments (AWS, GCP, Azure)
  • Communication skills

Nice to have

  • Multi-cloud infrastructure management
  • Designing secure, multi-tenant runtime environments

What the JD emphasized

  • 10+ years of experience working on software and operating distributed systems, with 2+ years managing engineering teams
  • deep technical familiarity with Kubernetes ecosystems, containerization technologies, and modern IaC tooling (e.g., Terraform, Crossplane, or Operators)
  • migration from Terraform-based Infrastructure as Code (IaC) to an Operator-driven lifecycle management model