Datacenter GPU Power Architect

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1

NVIDIA is seeking a Datacenter GPU Power Architect to contribute to power estimation models and tools for GPU products and systems. The role involves early architecture exploration, performance vs power analysis, and deploying machine learning techniques to develop power and performance models. Understanding GenAI/HPC workload characteristics at datacenter scale is crucial for driving new hardware/software features for Perf@Watt improvements.

What you'd actually do

  1. You will be contributing to power estimation models and tools for GPU products and systems like NVIDIA DGX/HGX based datacenters.
  2. Early GPU & System Architecture exploration with focus on energy efficiency and TCO improvements at GPU and Datacenter level.
  3. You will help with Performance vs Power Analysis, track ASIC milestones for impactful NVIDIA future product lineup.
  4. Deploy machine learning techniques to develop highly accurate power and performance models of our GPUs, CPUs, Switches, and platforms.
  5. Understand the workload characteristics for GenAI/HPC workloads at Datacenter Scale (multi-GPU) to drive new HW/SW features for Perf@Watt improvements.

Skills

Required

  • MSEE/MSCE, or equivalent experience with 2+ years of experience related to Power / Performance estimation and optimization techniques.
  • Knowledge of energy efficient chip design fundamentals and related tradeoffs.
  • Familiarity with low power design techniques such as multi-VT, Clock gating, Power gating, and Dynamic Voltage-Frequency Scaling (DVFS).
  • Understanding of processors (GPU is a plus), system-SW architectures, and their performance/power modeling techniques.
  • Proficiency with Python and data analysis packages like: Pandas, NumPy, PyTorch.
  • Familiarity with performance monitors/simulators used in modern processor architectures.

Nice to have

  • GPU is a plus

What the JD emphasized

  • energy efficiency
  • Perf/Watt
  • power estimation
  • performance models
  • GenAI/HPC workloads