Architect, Reliability & Quality Engineering

Oracle Oracle · Enterprise · United States

Seeking a Reliability & Quality Engineering leader to enhance the design quality, system reliability, and availability of large-scale data center electrical distribution architectures for Oracle Cloud Infrastructure. The role involves evaluating electrical systems, identifying failure points and risks, and developing resiliency concepts. The ideal candidate will partner with various teams to translate reliability analysis into design standards and product improvements.

What you'd actually do

  1. Evaluate end-to-end data center electrical distribution architectures, including utility or behind-the-meter interfaces, substations, medium-voltage distribution, switchgear, UPS systems, generators, BESS where applicable, protection systems, controls, and downstream power delivery.
  2. Identify design-level failure points, single points of failure, common-mode risks, hidden dependencies, protection coordination concerns, and failure modes that could materially impact availability.
  3. Assess how electrical systems perform under credible failure scenarios, including equipment faults, transfer events, protection operations, control-system failures, degraded-mode operation, maintenance conditions, and abnormal grid or generation events.
  4. Develop system-level resiliency concepts and design recommendations that improve fault isolation, recoverability, maintainability, failure containment, and blast radius reduction.
  5. Partner with electrical design engineering, commissioning, operations, construction, supply chain, and equipment vendors to translate reliability findings into design standards, product requirements, test expectations, acceptance criteria, and corrective action plans.

Skills

Required

  • 10+ years of experience in reliability engineering, quality engineering, electrical design assurance, product quality, mission-critical power systems, data center infrastructure, industrial power, utilities, generation, transmission and distribution, or equivalent high-availability environments.
  • Strong systems-thinking capability, with demonstrated ability to evaluate electrical architectures under failure scenarios rather than only assessing individual components or equipment ratings.
  • Deep understanding of data center electrical distribution concepts, including medium-voltage and low-voltage distribution, substations, switchgear, UPS systems, generators, protection schemes, controls, grounding, power quality, redundancy, maintainability, and operational recovery.
  • Experience identifying failure modes, common-mode vulnerabilities, cascading risks, hidden dependencies, and blast-radius concerns across complex electrical systems.
  • Proven ability to assess availability impact and reliability tradeoffs using structured engineering methods such as FMEA, fault-tree analysis, reliability block diagrams, event analysis, root-cause analysis, or similar methodologies.
  • Relevant product quality and reliability experience across power infrastructure, electrical equipment, standardized electrical products, or repeatable data center design platforms.
  • Ability to influence design standards, supplier requirements, equipment qualification expectations, commissioning acceptance criteria, and operational readiness deliverables.
  • Strong cross-functional leadership skills, with experience partnering across design engineering, construction, commissioning, operations, vendors, and executive stakeholders.
  • Excellent communication and executive reporting skills, with the ability to translate complex technical reliability risk into clear decisions, priorities, and implementation plans.
  • Commitment to safety, compliance, disciplined engineering governance, and operational excellence in high-uptime environments.

What the JD emphasized

  • reliability and quality engineering capability
  • mission-critical power infrastructure
  • electrical equipment
  • standardized data center design products
  • system-level resiliency concepts
  • failure isolation
  • fault containment
  • blast radius reduction
  • component-level reliability
  • overall electrical architecture
  • credible failure scenarios
  • high-availability environments