Senior Engineering Manager, Observability

MongoDB MongoDB · Enterprise · Dublin, Ireland · PTO Office of the CTO

Seeking a Senior Engineering Manager to lead a team responsible for MongoDB's Observability systems, including operational metrics, visualization, logs, traces, and alerts for customer deployments. The role involves managing a large team, contributing to the design and architecture of high-volume ingestion systems, and ensuring the reliability and performance of complex distributed systems at scale.

What you'd actually do

  1. Lead and coach a large team of motivated individual contributors (up to 10) or leads who are eager to learn and grow
  2. Contribute to the design, infrastructure and architecture of the high-volume ingestion systems your team develops
  3. Work closely with product managers, program managers, and other engineering teams to specify, prioritize and deliver new features that delight our users, internally and externally
  4. Estimate task complexity, report progress, and voice risks for projects executed by the team
  5. Work with customers and support engineers to fix issues and become part of our on-call rotation

Skills

Required

  • 8 years of professional software development experience
  • 4 years of people management experience and performance management
  • managing larger teams or leads
  • led engineering teams that have built, designed and maintained large scale systems
  • delivering operationally solid systems
  • high bar of engineering excellence
  • writing large-scale, distributed backend systems in a compiled language
  • experience with at least one major cloud provider technology (AWS, Azure, GCP)
  • solve tough problems and debug tricky production outages
  • excellent communication skills

Nice to have

  • curious
  • collaborative
  • motivated

What the JD emphasized

  • strict SLO on security, durability, availability and performance
  • high-volume ingestion systems
  • high-cardinality observability data
  • billions of metrics time series
  • petabytes of logs, traces, and events
  • low-level systems expertise
  • large scale systems
  • operationally solid systems
  • high bar of engineering excellence
  • tricky production outages