Data Center Engineer, Resource Efficiency – Compute Supply

Anthropic Anthropic · AI Frontier · United States · Remote · Compute

This role focuses on optimizing data center resource efficiency, specifically power and cooling, for AI compute (TPU/GPU fleet). It involves building models for consumption forecasting, designing IT/OT interfaces for real-time telemetry, and operating load management systems to maximize throughput while meeting availability SLOs. While it uses AI/ML tools for optimization, the core function is infrastructure engineering for efficiency, not direct AI model development.

What you'd actually do

  1. Build models that forecast consumption across electrical and mechanical subsystems, informing capacity planning, energy procurement, oversubscription targets and risks, including statistical modeling of cluster utilization, workload profiles, and failure modes.
  2. Design IT/OT interfaces that bridge compute orchestration with facility controls, enabling real-time telemetry across accelerator hardware, power distribution, cooling, and schedulers.
  3. Build and operate load management systems that use power and cooling topology to enable load management and power/thermal-aware placement to maximize throughput while meeting SLOs.
  4. Partner with data center providers to drive design optimizations and hold them accountable to SLA-grade performance standards, providing technical diligence on partner architectures.

Skills

Required

  • Data center power distribution and cooling architectures
  • Reliability engineering
  • SLA development
  • Failure-mode analysis
  • Statistical modeling and simulation
  • SCADA/BMS/EPMS
  • Telemetry pipelines
  • Control systems
  • Software bridging IT and OT
  • Python
  • Cross-functional collaboration

Nice to have

  • Accelerator deployments and their power management interfaces
  • Demand response
  • Grid interaction
  • Behind-the-meter generation
  • Control theory
  • Dynamical systems
  • Cyber-physical systems design
  • Energy storage
  • Microgrid integration
  • ML/optimization techniques applied to infrastructure or energy systems
  • Reliability engineering methods
  • SLA development, availability modeling, or service credit frameworks

What the JD emphasized

  • 5+ years of experience in data center infrastructure or facility engineering
  • Demonstrated experience with data center power distribution and cooling system architectures
  • Experience building or operating software-based power management, load scheduling, or control systems
  • Proficiency in Python or similar languages for statistical modeling, simulation, or automation of data center infrastructure optimizations
  • Familiarity with SCADA, BMS, EPMS, or industrial control systems and associated protocols (Modbus, BACnet, SNMP)