Sr. Director, Design Quality & Reliability – Oci Data Center Infrastructure

Oracle Oracle · Enterprise · United States

This role focuses on establishing and scaling a Design Quality & Reliability program for AI data center infrastructure, encompassing design standards, product qualification, supplier quality, reliability engineering, and continuous improvement. The candidate will lead a multidisciplinary team and partner with various engineering and operations departments to ensure high reliability and performance objectives for OCI's infrastructure platforms.

What you'd actually do

  1. Establish and scale OCI’s Design Quality & Reliability organization for AI data center infrastructure.
  2. Ensure infrastructure designs meet OCI reliability, resiliency, maintainability, and lifecycle performance requirements.
  3. Define qualification and acceptance criteria for critical infrastructure products and systems used in OCI data centers.
  4. Develop KPI dashboards and measurement systems to benchmark design and product reliability performance across the OCI infrastructure portfolio.
  5. Partner with Infrastructure Engineering, Capacity Delivery, Operations, Supply Chain, and Product teams to ensure reliability objectives are embedded throughout the lifecycle.

Skills

Required

  • quality
  • reliability engineering
  • critical infrastructure
  • manufacturing quality
  • technical leadership
  • leading large-scale engineering or quality organizations
  • hyperscale infrastructure
  • cloud
  • semiconductor
  • power systems
  • telecom
  • mission-critical environments
  • reliability engineering methodologies
  • statistical analysis techniques
  • supplier quality management
  • complex hardware ecosystems
  • critical infrastructure systems including power distribution, cooling, controls, mechanical, and electrical systems

Nice to have

  • hyperscale data center infrastructure
  • cloud infrastructure environments
  • AFR
  • IDR
  • MTBF
  • reliability growth methodologies
  • GW-scale infrastructure deployment programs
  • measurable reliability improvements across large operational fleets
  • Advanced degree in Engineering, Reliability Engineering, Mechanical Engineering, Electrical Engineering, or related field

What the JD emphasized

  • AI data center infrastructure