Senior Performance Modeling Architect, Cpu Fabric and Llc

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

Senior Performance Modeling Architect role at NVIDIA focusing on CPU Cache Hierarchies and interconnects for Automotive and Data Center systems. Responsibilities include developing high-fidelity performance models, analyzing bottlenecks, evaluating coherency protocols, and collaborating with other teams. Requires a Master's or Ph.D. in a relevant field with strong computer architecture, C++/SystemC modeling, and Python scripting skills.

What you'd actually do

  1. Developing and maintaining high-fidelity, cycle-accurate performance models (C++/SystemC) for coherent interconnects and large-scale shared caches.
  2. Modeling and analyzing performance bottlenecks across varying scales, from small-cluster automotive SoCs to massive, multi-mesh data center architectures.
  3. Evaluating the performance impact of different coherency protocols (e.g., CHI, ACE, or proprietary) and snooping filters.
  4. Running and analyzing industry-standard benchmarks (SPEC, MLPerf, Automotive-specific suites) to drive architectural trade-offs.
  5. Collaborating with build and verification teams to correlate performance models with silicon and working with software teams to optimize drivers for the underlying hardware topology.

Skills

Required

  • Master's or Ph.D. in Computer Engineering, Electrical Engineering, or Computer Science (or equivalent experience) with a focus on architecture with 5+ years of experience.
  • Strong understanding of CPU microarchitecture, memory consistency models, and cache coherency protocols.
  • Proven experience in C++ or SystemC for cycle-accurate or functional modeling.
  • Proficiency in Python or similar scripting languages for processing large datasets, generating performance visualizations, and automating simulation sweeps.
  • Understanding of Network-on-Chip (NoC) topologies (Mesh, Ring, Torus), credit-based flow control, and arbitration logic.

Nice to have

  • Practical experience managing the functional safety (ISO 26262) requirements of automotive chips alongside the power-performance-area (PPA) limitations of data center hardware.
  • Experience defining or using PMU (Performance Monitoring Unit) events to debug performance on real silicon or emulators.
  • A background in using formal verification or mathematical modeling to prove the correctness of complex coherency state machines.
  • A history of building your own internal tools or frameworks to accelerate architectural exploration rather than just using off-the-shelf simulators.
  • Knowledge of emerging memory technologies like CXL (Compute Express Link) or HBM (High Bandwidth Memory) and how they collaborate with coherent fabrics.

What the JD emphasized

  • high-fidelity, cycle-accurate performance models
  • performance bottlenecks
  • coherency protocols
  • industry-standard benchmarks
  • performance models with silicon
  • software teams to optimize drivers
  • functional safety (ISO 26262)
  • power-performance-area (PPA)
  • Hardware Performance Counters
  • PMU (Performance Monitoring Unit)
  • formal verification
  • mathematical modeling
  • Custom Tooling
  • Advanced Memory Systems
  • CXL (Compute Express Link)
  • HBM (High Bandwidth Memory)