Senior Software Engineer - Observability and Reliability

Sigma Computing Sigma Computing · Data AI · San Francisco, CA · Engineering

Senior Software Engineer focused on building and maintaining observability tools and platforms, including metrics, logging, distributed tracing, dashboarding, alerting, and application performance management. The role involves optimizing cloud triaging, defining best practices for system measurability, and collaborating on design and code reviews. Experience with Go, Open Telemetry, and Kubernetes is mentioned, along with a product mindset for infrastructure systems.

What you'd actually do

  1. Build observability tools and platforms, including: metrics, logging, distributed tracing, dashboarding, alerting, application performance management
  2. Build with modern tools and languages like Go, Open Telemetry and Kubernetes
  3. Participate in on-call rotation and ensure uptime of services
  4. Create runtime tools/processes that optimize cloud triaging and limit downtime
  5. Define best practices around making our systems and services measurable

Skills

Required

  • Strong Computer Science fundamentals
  • 5+ years industry experience building and maintaining high-quality software, especially software other engineers use
  • Product mindset to infrastructure systems
  • Go
  • Open Telemetry
  • Kubernetes

Nice to have

  • Experience building systems for data analytics
  • Distributed systems monitoring and profiling skills
  • Knowledge of cloud application security models
  • Administered cloud service infrastructure (GCP, AWS, Azure)
  • Startup experience