Software Engineer, Caching Infrastructure

OpenAI OpenAI · AI Frontier · San Francisco, CA · Applied AI

Software Engineer focused on building and scaling a multi-tenant caching platform for OpenAI's AI products. This role involves designing, operating, and defining the long-term vision for a critical infrastructure component that supports use cases like inference, identity, and quota management. Requires deep expertise in distributed systems, caching (Redis, Memcached), and Kubernetes.

What you'd actually do

  1. Design, build, and operate OpenAI’s multi-tenant caching platform used across inference, identity, quota, and product experiences.
  2. Define the long-term vision and roadmap for caching as a core infra capability, balancing performance, durability, and cost.
  3. Collaborate with other infra teams (e.g., networking, observability, databases) and product teams to ensure our caching platform meets their needs.

Skills

Required

  • distributed systems
  • caching systems
  • Redis
  • Memcached
  • Kubernetes
  • service orchestration
  • networking fundamentals
  • load balancing
  • storage systems
  • clustering
  • durability configurations
  • client-side connection patterns
  • performance tuning
  • service meshes
  • Envoy
  • autoscaling systems
  • latency
  • reliability
  • throughput
  • cost optimization

What the JD emphasized

  • 5+ years of experience building and scaling distributed systems
  • deep expertise with Redis, Memcached, or similar solutions
  • production experience with Kubernetes