Key Responsibilities
- Lead overall operations of OCI data center facilities, ensuring high availability, reliability, safety, and operational excellence.
- Manage and develop teams of Data Center Operations Engineers, Facility Engineers, Technicians, and vendors.
- Drive site readiness for new deployments, capacity expansions, AI infrastructure rollouts, and liquid-cooling implementations.
- Partner with Engineering, Network Operations, Construction, Capacity Planning, Security, and Global Operations teams to support business growth.
- Oversee deployment of server, storage, and network infrastructure at hyperscale scale.
- Manage critical electrical infrastructure (UPS, generators, switchgear, substations, PDUs, ATS/STS, power distribution systems).
- Manage critical mechanical and cooling infrastructure (chillers, cooling towers, CRAH/CRAC systems, liquid cooling, CDUs, BMS).
- Lead commissioning, operational acceptance, and readiness reviews for new facilities and infrastructure.
- Monitor environmental and operational metrics using BMS, DCIM, EPMS, and related monitoring platforms.
- Own vendor and colocation provider relationships, SLA compliance, governance reviews, and escalations.
- Lead incident response, change management, root cause analysis (RCA), and risk mitigation activities.
- Manage site KPIs, operational reporting, budgets, forecasting, and continuous improvement initiatives.
Required Qualifications
- Bachelor’s degree in Engineering, Facilities Management, Data Center Operations, or a related technical field (or equivalent experience).
- 10+ years of experience in hyperscale data center operations, critical facilities, or mission-critical infrastructure management.
- 5+ years of leadership experience managing technical operations teams.
- Experience working in major cloud or hyperscale environments (OCI, AWS, Azure, Google Cloud, Meta, etc.).
- Strong knowledge of electrical and mechanical systems supporting data centers.
- Experience with liquid-cooled and high-density computing environments.
- Experience leading large-scale infrastructure deployment and capacity expansion projects.
- Experience managing colocation providers, vendors, and service delivery partners.
- Knowledge of incident management, operational governance, commissioning, and operational readiness processes.
- Strong communication, stakeholder management, and leadership skills.
**Nice-to-Have **
- Experience with AI, GPU, or HPC infrastructure.
- Knowledge of ASHRAE guidelines, liquid cooling technologies, and power redundancy architectures (N, N+1, 2N).
- Experience with BMS, DCIM, EPMS, and industrial control systems.
- Sustainability and energy-efficiency program experience.
- Certifications such as ITIL, CDCP/CDCS/DCEP, Uptime Institute, or PMP.
Career Level - M3