Manager, Software Engineering - Observability

Figma Figma · Enterprise · Canada +1 · Engineering

Manager for an Observability Engineering team at Figma. The role involves leading a team to build and operate systems for visibility into platform health, performance, and efficiency, including metrics, logs, and traces. A key focus is exploring and implementing AI-driven approaches for anomaly detection, root cause analysis, and operational automation, alongside managing observability and infrastructure costs.

What you'd actually do

  1. Lead and grow a team of engineers responsible for the reliability, scalability, and evolution of Figma’s observability and cost engineering platforms
  2. Own and operate Figma’s core observability stack, including vendor platforms such as Datadog, ensuring high availability, strong data quality, and effective signal-to-noise across metrics, logs, and traces
  3. Define and drive the technical strategy for instrumentation standards, observability libraries, agents, and operators used to monitor internal and external facing services
  4. Explore and implement innovative, AI-driven approaches to anomaly detection, root cause analysis, signal correlation, and operational automation
  5. Establish clear frameworks for cost attribution, budgeting, forecasting, and alerting across infrastructure and observability spend, enabling teams to make informed tradeoffs

Skills

Required

  • 4+ years of experience leading infrastructure, observability, or platform engineering teams
  • Deep hands-on experience with modern observability platforms (e.g., Datadog, OpenTelemetry) across metrics, logs, and distributed tracing
  • Strong understanding of distributed systems, instrumentation best practices, SLO design, and incident response workflows
  • Experience driving cost transparency and accountability initiatives, including cost attribution, budgeting, forecasting, and alerting in cloud environments
  • Demonstrated ability to set technical direction, drive cross-functional alignment (Engineering, Finance, Security), and make sound architectural decisions in complex environments

Nice to have

  • Experience designing or evolving company-wide observability standards, shared libraries, and agent/operator-based integrations
  • Background in cost optimization for infrastructure or observability tooling, including vendor negotiations and usage modeling
  • Experience applying AI or machine learning techniques to anomaly detection, root cause analysis, or operational automation
  • Familiarity with OpenTelemetry and modern instrumentation frameworks across multiple programming languages
  • Experience scaling and mentoring high-performing engineering teams through platform expansion or significant architectural change

What the JD emphasized

  • AI-driven approaches to anomaly detection
  • operational automation
  • observability
  • cost transparency

Other signals

  • AI-driven approaches to anomaly detection
  • operational automation
  • observability
  • cost transparency