Software Engineer, Habitat (online Data)

OpenAI OpenAI · AI Frontier · Seattle, WA · Applied AI

Software Engineer role focused on building and operating a core online database platform (Habitat) for OpenAI's internal user data. Responsibilities include designing and implementing abstractions for storage, caching, routing, CDC, and privacy enforcement, while improving performance, reliability, and cost efficiency of high-QPS, latency-sensitive workloads. The role requires strong distributed systems experience and ownership of end-to-end system development and operations.

What you'd actually do

  1. Design and build core abstractions spanning storage, caching, routing, CDC, and privacy enforcement
  2. Own a major surface area end to end, from product and API design to operational excellence
  3. Improve latency, correctness, and cost efficiency for real production workloads at massive scale
  4. Build strong instrumentation, debugging workflows, and developer-first tooling
  5. Collaborate closely with internal product and infrastructure teams to understand requirements and ship pragmatic solutions

Skills

Required

  • building and operating high-scale backend or data-intensive distributed systems in production
  • systems judgment
  • making tradeoffs across latency, cost, correctness, and reliability
  • owning ambiguous problems end to end
  • driving roadmap plus execution
  • databases
  • caching systems
  • routing and load balancing
  • indexing and retrieval
  • CDC pipelines
  • tail latency and global performance optimization
  • designing platform APIs
  • multi-region systems
  • consistency semantics
  • failover design
  • 8+ years of industry experience
  • 3+ years leading large-scale, complex projects or technical initiatives
  • Rust
  • Python
  • communication skills

Nice to have

  • guardrails
  • debuggability
  • p95, p99, request steering, locality

What the JD emphasized

  • high-scale backend or data-intensive distributed systems
  • systems judgment
  • latency, cost, correctness, and reliability
  • ambiguous problems end to end
  • databases, caching systems, routing and load balancing, indexing and retrieval, CDC pipelines
  • tail latency and global performance optimization
  • platform APIs consumed by many internal teams
  • multi-region systems, consistency semantics, and failover design
  • 8+ years of industry experience
  • 3+ years leading large-scale, complex projects or technical initiatives
  • Rust and/or Python
  • Rust preferred for core systems work
  • Excellent communication skills