Staff Site Reliability Engineer, Fabric

MongoDB MongoDB · Enterprise · New York, NY +2 · Remote · PTO Site Reliability Engineering

Staff Site Reliability Engineer focused on network infrastructure, service mesh, and edge load balancing for a multi-cloud environment. Responsibilities include building and maintaining resilient, scalable, and reliable systems for secure communication between services, participating in on-call rotations, and collaborating with service-owning teams. Requires deep expertise in networking fundamentals, distributed systems, and cloud-based infrastructure.

What you'd actually do

  1. Participate in the development of a reliable and resilient multi-cloud globally-connected network that is crucial for MongoDB’s services
  2. Collaborate with service-owning teams to provide internal support, addressing technical issues and offering guidance on best practices for service-to-service connectivity
  3. Participate in a 24/7 on-call rotation to swiftly resolve issues related to network architecture and service-to-service connectivity, ensuring minimal disruption and high availability

Skills

Required

  • 10+ years of experience working on software and operating distributed systems
  • deep expertise in networking fundamentals
  • good understanding of how the internet works
  • TCP/IP (including IPv6)
  • DNS
  • TLS/mTLS
  • BGP
  • tunnels
  • overlays
  • SDN principles
  • modern cloud-based infrastructure
  • network design primitives of at least one of AWS, Azure, or GCP
  • VPCs
  • subnetting
  • routing
  • VPNs
  • peering
  • private link / private service connect
  • CDNs
  • service mesh
  • load-balancing concepts

Nice to have

  • customer-focused mindset
  • efficiency in processes and operations
  • automation over manual processes
  • allergic to ops work
  • eager to implement these in a multi-cloud environment

What the JD emphasized

  • deep expertise in networking fundamentals
  • good understanding of how the internet works
  • intimately familiar with modern cloud-based infrastructure
  • strong knowledge of service mesh and load-balancing concepts
  • eager to implement these in a multi-cloud environment
  • customer-focused mindset
  • value efficiency in processes and operations
  • strong preference for automation over manual processes
  • allergic to ops work