Director, Data Center Facility Operations - Saline Township, Mi

Oracle Oracle · Enterprise · MI

Director of Data Center Facility Operations responsible for leading enterprise-wide performance monitoring, operational governance, and ensuring the resilience, compliance, and audit-readiness of critical infrastructure. The role drives automation, telemetry, and predictive maintenance, establishes crisis management standards, and oversees the full lifecycle of critical infrastructure assets to optimize reliability, security, and scalability.

What you'd actually do

  1. Owns 100% uptime operations for a portfolio of very large/complex data center sites, ensuring consistent execution of shift coverage, operational handoffs, and standardized runbooks.
  2. Defines the enterprise strategy for real-time monitoring and operational health across the portfolio (BMS/EPMS/SCADA/telemetry), aligning KPIs to uptime, reliability, safety, and customer outcomes.
  3. Governs standards for event triage, incident command, escalation, stakeholder communications, and customer-impacting notifications.
  4. Oversees evaluation of power, cooling, physical space, network/support infrastructure, and security capacity, ensuring readiness for load growth and peak conditions.
  5. Drives adoption of automation for alarm correlation, workflow orchestration, remote operations, and predictive analytics to reduce human error and improve response times.

Skills

Required

  • Data center operations leadership
  • Performance monitoring and governance
  • Critical infrastructure management (power, cooling, controls, life safety, security)
  • Capacity planning and readiness assessment
  • Automation and telemetry adoption
  • Predictive maintenance implementation
  • Crisis management and incident response
  • Continuous improvement methodologies
  • Asset lifecycle management
  • Vendor performance management
  • Financial management and investment governance
  • Risk assessment and mitigation
  • Compliance and audit readiness
  • Team building and development
  • Cross-functional collaboration

Nice to have

  • Experience with BMS/EPMS/SCADA systems
  • Knowledge of LOTO and energized work policies
  • Familiarity with specific data center technologies (e.g., vSphere, Kubernetes, networking protocols)

What the JD emphasized

  • 100% uptime operations
  • Mission Critical Operations (MCO)
  • high-severity incidents
  • real-time monitoring
  • operational health
  • uptime
  • reliability
  • safety
  • customer outcomes
  • MTTR/MTBF
  • repeat events
  • risk posture
  • preventive and predictive maintenance
  • MOP/SOP/EOP quality
  • change control
  • operational compliance
  • event triage
  • incident command
  • escalation
  • stakeholder communications
  • customer-impacting notifications
  • post-incident reviews
  • root cause analysis (RCA)
  • corrective/preventive actions (CAPA)
  • executive escalation point
  • complex incidents
  • cross-regional reliability risks
  • power, cooling, physical space, network/support infrastructure, and security capacity
  • load growth
  • peak conditions
  • resiliency standards
  • redundancy
  • maintenance windows
  • failover testing
  • generator/UPS readiness
  • fuel strategy
  • operational risk assessments
  • audit-ready
  • applicable standards
  • internal controls
  • automation
  • alarm correlation
  • workflow orchestration
  • remote operations
  • predictive analytics
  • human error
  • response times
  • data quality
  • instrumentation
  • high-confidence operational decision-making
  • expansions/new builds/site launches
  • Day-0/Day-1 readiness
  • staffing
  • training
  • spares
  • procedures
  • turnover acceptance criteria
  • operability
  • maintainability
  • safety into design and commissioning
  • lifecycle strategy
  • critical infrastructure
  • supporting hardware assets
  • installation
  • maintenance
  • spares
  • logistics
  • inventory
  • decommissioning
  • vendor performance
  • SLAs
  • service quality
  • compliance
  • performance gaps
  • multi-million dollar investments
  • upgrades
  • capacity expansion
  • reliability improvements
  • risk remediation
  • strategic oversight
  • mission-critical operational initiatives
  • reliability risk
  • customer impact
  • compliance needs
  • engineering, construction, security, network/IT, program management, and business stakeholders
  • reliable 24/7 delivery
  • complex operational/technical issues
  • disciplined, data-driven resolution
  • prevention of recurrence
  • operational excellence
  • training programs
  • certifications
  • drills
  • sustained improvement roadmap
  • availability
  • risk reduction
  • high-performing 24/7 operations organization
  • shift leaders
  • incident commanders
  • regional operations management
  • 24/7/365 environment
  • incident and team management across all shifts
  • life safety
  • safe work practices
  • LOTO
  • energized work policies