Senior Staff Engineer

MongoDB MongoDB · Enterprise · Dublin, Ireland · PTO Office of the CTO

Seeking a Senior Staff Engineer to design, build, and operate the internal and external Observability stack for the MongoDB platform. This role involves managing systems for metrics, visualization, logs, traces, and alerts, handling massive data volumes (billions of time series, petabytes of logs/traces) and ensuring strict SLOs on security, durability, availability, and performance. The engineer will also collaborate with other teams to promote best practices in instrumentation and monitoring.

What you'd actually do

  1. Define standards and vision for the mission-critical observability platform, leading the architecture and implementation of components that drive performance, scalability, cost-efficiency, and resiliency
  2. Design and implement observability improvements that enable MongoDB engineers and customers to quickly and accurately diagnose the root cause of production issues.
  3. Handle production customer escalations from Technical Support team and coach teammates to do the same
  4. Write production-ready database code, improve the existing code, and mentor their team to write higher quality code
  5. Own all code the Observability Team maintains, ensuring it achieves a high standard for quality (including security, durability, availability, and performance) and maintainability

Skills

Required

  • Minimum 12 years of experience in designing, programming, debugging, and tuning distributed and/or highly concurrent C/C++/Java/Rust mission critical software systems
  • Experience running latency sensitive, high throughput systems
  • Strong systems fundamentals, including multi-threaded programming, performance profiling, and expert-level programming
  • Familiarity with database internals or building core components for data processing systems
  • Familiarity with observability ecosystem and best practices
  • Excellent verbal and written technical communication skills
  • Strong desire to collaborate with colleagues and mentor engineers
  • Excellent time and project management skills
  • Has a good understanding of information security management

Nice to have

  • setting direction and technical leadership for large engineering teams

What the JD emphasized

  • strict SLO on security, durability, availability and performance
  • high-cardinality observability data
  • 10’s of billions of metrics time series
  • petabytes of logs, traces, and events
  • latency sensitive, high throughput systems
  • mission critical software systems