Data Center Operations Controls Engineer

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1

NVIDIA is seeking a Data Center Operations Controls Engineer to manage and improve their Cronus monitoring and control platform. Responsibilities include resolving operational issues, leading cleanup efforts, establishing configuration baselines, developing training, supporting integrations, and defining operational metrics. The role requires extensive experience in data center operations or similar environments, strong understanding of controls systems, and proven collaboration skills.

What you'd actually do

  1. Work together with Controls engineering to prioritize and coordinate the resolution of critical UI, stability, and interoperability issues affecting data center operations.
  2. Lead operational cleanup at live sites, including nuisance alarm reduction, disabled point remediation, and restoration of a usable monitoring baseline.
  3. Collaborate with engineering and operations to establish and uphold a consistent Controls version and configuration baseline, including setpoints, thresholds, and alarm defaults.
  4. Help establish naming standards, topology mapping methods, and configuration governance to ensure consistency across sites.
  5. Own the development and delivery of training, documentation, and knowledge transfer for data center operators and FOC teams using the Controls system.

Skills

Required

  • operations
  • controls
  • monitoring systems
  • data center environments
  • industrial environments
  • large-scale infrastructure environments
  • controls systems
  • monitoring platforms
  • SCADA-like tools
  • alarms
  • setpoints
  • configuration management
  • partnering with engineering
  • partnering with operations
  • partnering with vendor teams
  • communication skills
  • operational metrics
  • performance improvements
  • reliability improvements

Nice to have

  • managing data center operations
  • critical facilities management
  • Ignition control systems
  • process control
  • industrial automation
  • building management systems
  • leading integrations
  • change process oversight
  • incident response
  • configuration control approaches

What the JD emphasized

  • 12+ years of experience in operations, controls, or monitoring systems in data center, industrial, or large-scale infrastructure environments.
  • Strong understanding of controls systems, monitoring platforms, or SCADA-like tools, including alarms, setpoints, and configuration management.
  • Proven success partnering with engineering, operations, and vendor teams to stabilize and improve technical platforms.
  • Track record of defining and using operational metrics to drive performance and reliability improvements.