Head of Data Center Rack and Cluster

OpenAI OpenAI · AI Frontier · San Francisco, CA · Scaling

This role leads an engineering team responsible for defining rack and system architectures for OpenAI's compute infrastructure, focusing on the early phases of compute definition and delivery of production-ready racks. The ideal candidate has extensive experience in system bring-up and bringing platforms to a stable, production-ready state.

What you'd actually do

  1. Own the reference rack, cluster, and system architecture standards for new OpenAI compute platforms.
  2. Define readiness and acceptance criteria for production-bound systems.
  3. Stay engaged through validation until configurations are proven repeatable and ready for handoff.
  4. Manage relationships with accelerator and equipment vendors to define an overall roadmap
  5. Partner across the industrial compute and partner teams to bring clarity to requirements and ensure smooth delivery of next-gen systems

Skills

Required

  • hyperscale data center experience
  • rack, system, or network architecture definition
  • performance and TCO modeling
  • delivery of complex new hardware platforms
  • vendor relationship management
  • management and leadership skills

What the JD emphasized

  • extensive hands-on experience in system bring up
  • proven ability to bring platforms to a stable state ready for production use