Data Center Operations Controls Engineer

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1

NVIDIA is seeking a Data Center Operations Controls Engineer to manage the operational readiness, support, and governance of their Cronus monitoring and control platform. The role involves collaborating with Controls engineering to resolve issues, lead operational cleanup, establish configuration baselines, define standards, develop training, support integrations, and track operational metrics. Requires 12+ years of experience in data center operations or similar environments, with a strong understanding of controls systems and monitoring platforms.

What you'd actually do

  1. Work together with Controls engineering to prioritize and coordinate the resolution of critical UI, stability, and interoperability issues affecting data center operations.
  2. Lead operational cleanup at live sites, including nuisance alarm reduction, disabled point remediation, and restoration of a usable monitoring baseline.
  3. Collaborate with engineering and operations to establish and uphold a consistent Controls version and configuration baseline, including setpoints, thresholds, and alarm defaults.
  4. Help establish naming standards, topology mapping methods, and configuration governance to ensure consistency across sites.
  5. Own the development and delivery of training, documentation, and knowledge transfer for data center operators and FOC teams using the Controls system.

Skills

Required

  • 12+ years of experience in operations, controls, or monitoring systems in data center, industrial, or large-scale infrastructure environments.
  • B.S. in related field or equivalent experience.
  • Strong understanding of controls systems, monitoring platforms, or SCADA-like tools, including alarms, setpoints, and configuration management.
  • Proven success partnering with engineering, operations, and vendor teams to stabilize and improve technical platforms.
  • Excellent communication skills, with the ability to translate technical issues into clear operational actions for frontline teams.
  • Track record of defining and using operational metrics to drive performance and reliability improvements.

Nice to have

  • Experience in managing data center operations or critical facilities.
  • Experience with Ignition control systems.
  • Background in process control, industrial automation, or building management systems.
  • Experience leading integrations between monitoring platforms and other infrastructure tools.
  • Experience with change process oversight, incident response, and configuration control approaches.

What the JD emphasized

  • critical UI, stability, and interoperability issues
  • nuisance alarm reduction
  • disabled point remediation
  • consistent Controls version and configuration baseline
  • naming standards, topology mapping methods, and configuration governance
  • operational metrics