Staff Software Engineer, Observability

Pinterest Pinterest · Consumer · San Francisco, CA · Data Engineering

Staff Software Engineer, Observability at Pinterest. This role focuses on designing and building world-class observability solutions (metrics, logs, traces) for large-scale distributed systems. It requires deep technical expertise in distributed systems, data engineering, and a product-oriented mindset to empower the engineering organization. The role involves defining roadmaps, architecting infrastructure, building data pipelines, championing best practices, and providing technical leadership. Experience with ML/anomaly detection for observability use cases is a plus.

What you'd actually do

  1. Define and execute the observability roadmap, treating it as a product. Understand engineering team needs and translate them into technical solutions with measurable impact
  2. Architect, build, and scale distributed observability infrastructure (metrics, logs, traces) to handle massive volumes across Pinterest's distributed systems
  3. Build high-performance data pipelines and storage for real-time and historical telemetry analysis at Pinterest scale
  4. Champion Best Practices: Establish observability standards and patterns across the organization, making it easy for teams to instrument their services and gain actionable insights
  5. Technical Leadership: Mentor engineers, lead architectural reviews, and influence technical decisions across teams to improve overall system reliability and performance

Skills

Required

  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience
  • Product Mindset: Demonstrated ability to work backwards from customer needs —understanding user needs, prioritizing features, measuring success, and iterating based on feedback. Experience building internal platforms or tools with strong adoption
  • Distributed Systems Expertise: 7+ years of experience designing and operating large-scale distributed systems with deep understanding of consistency, availability, scalability, and failure modes
  • Data Engineering Skills: Strong background in building data pipelines, working with time-series databases, columnar storage, stream processing (Kafka, Flink, etc.), and data modeling at scale
  • Observability Domain Knowledge: Hands-on experience with modern observability tools and practices including metrics, logging, tracing, and profiling. Familiarity with OpenTelemetry, Prometheus, Grafana, or similar technologies
  • Programming Proficiency: Expert-level coding skills in languages like Java, Python, Go, or Scala with ability to write production-quality code
  • Systems Thinking: Ability to see the big picture while managing complex technical details, balancing trade-offs between cost, performance, and reliability

Nice to have

  • Experience with machine learning or anomaly detection applied to observability use cases
  • Strong communication skills with ability to influence stakeholders at all levels
  • Contributions to open-source observability projects, a plus
  • Familiarity with cloud-native architectures and technologies (Kubernetes, service mesh, etc.)
  • Track record of driving adoption of internal platforms through excellent documentation, UX, and developer advocacy

What the JD emphasized

  • Experience building observability platforms from the ground up or significantly scaling existing solutions