Senior Software Engineer - Observability and Reliability

Sigma Computing Sigma Computing · Data AI · San Francisco, CA · Engineering

Senior Software Engineer focused on building and maintaining observability tools and platforms, including metrics, logging, distributed tracing, dashboarding, alerting, and application performance management. The role involves optimizing cloud triaging, limiting downtime, and defining best practices for system measurability, using modern tools like Go and OpenTelemetry.

What you'd actually do

  1. Build observability tools and platforms, including: metrics, logging, distributed tracing, dashboarding, alerting, application performance management
  2. Build with modern tools and languages like Go, Open Telemetry and Kubernetes
  3. Participate in on-call rotation and ensure uptime of services
  4. Create runtime tools/processes that optimize cloud triaging and limit downtime
  5. Define best practices around making our systems and services measurable

Skills

Required

  • Strong Computer Science fundamentals
  • 5+ years industry experience building and maintaining high-quality software, especially software other engineers use
  • You apply a product mindset to infrastructure systems and feel accomplished enabling others

Nice to have

  • Experience building systems for data analytics
  • Distributed systems monitoring and profiling skills
  • Knowledge of cloud application security models
  • Administered cloud service infrastructure (GCP, AWS, Azure)
  • Startup experience