Staff Site Reliability Engineer, Fabric

MongoDB MongoDB · Enterprise · New York, NY +2 · Remote · PTO Site Reliability Engineering

Staff Site Reliability Engineer focused on building and maintaining the infrastructure for secure and efficient communication between MongoDB's services, including network architecture, service mesh, and edge load balancing in a multi-cloud environment.

What you'd actually do

  1. Participate in the development of a reliable and resilient multi-cloud globally-connected network that is crucial for MongoDB’s services
  2. Collaborate with service-owning teams to provide internal support, addressing technical issues and offering guidance on best practices for service-to-service connectivity
  3. Participate in a 24/7 on-call rotation to swiftly resolve issues related to network architecture and service-to-service connectivity, ensuring minimal disruption and high availability

Skills

Required

  • 10+ years of experience working on software and operating distributed systems
  • deep expertise in networking fundamentals
  • good understanding of how the internet works, e.g. TCP/IP (including IPv6), DNS, TLS/mTLS, BGP, tunnels, overlays, and SDN principles
  • modern cloud-based infrastructure and the network design primitives of at least one of AWS, Azure, or GCP, e.g. VPCs, subnetting, routing, VPNs, peering, private link / private service connect, and CDNs
  • strong knowledge of service mesh and load-balancing concepts

Nice to have

  • customer-focused mindset
  • value efficiency in processes and operations
  • strong preference for automation over manual processes (“allergic to ops work”)

What the JD emphasized

  • deep expertise in networking fundamentals
  • strong knowledge of service mesh and load-balancing concepts
  • eager to implement these in a multi-cloud environment
  • customer-focused mindset
  • strong preference for automation over manual processes