Software Engineer, Observability

OpenAI OpenAI · AI Frontier · San Francisco, CA · Applied AI

Software Engineer role focused on building the observability product for OpenAI, including scalable infrastructure for logs and metrics, and AI-native tools like agents for issue detection and debugging. The role involves owning core observability infrastructure and contributing to UI experiences, with a focus on making production systems reliable, performant, and observable. It's an engineering role within an AI company, building AI-powered internal tools.

What you'd actually do

  1. Own core observability infrastructure, including distributed logging, time series, and trace storage
  2. Build AI-native tools that help engineers detect, understand, and resolve issues autonomously.
  3. Contribute to UI experiences like dashboards, notebooking, or interactive debugging
  4. Collaborate closely with engineers, researchers, user ops, and other teams across the company to build the next generation observability product

Skills

Required

  • large-scale distributed systems
  • logging systems
  • time series databases
  • systems fundamentals
  • networking fundamentals
  • cloud infra (Kubernetes, AWS, etc.)

Nice to have

  • observability systems (e.g. Prometheus, OpenTelemetry, etc.)

What the JD emphasized

  • AI-native tools
  • agents that summarize SEVs
  • auto-generate dashboards
  • help engineers debug through notebook-like UIs
  • AI-powered UI
  • AI-powered observability

Other signals

  • AI-native tools
  • agents that summarize SEVs
  • auto-generate dashboards
  • help engineers debug through notebook-like UIs
  • AI-powered UI
  • AI-powered observability