What you'd actually do

Developing and maintaining high-fidelity, cycle-accurate performance models (C++/SystemC) for coherent interconnects and large-scale shared caches.

Modeling and analyzing performance bottlenecks across varying scales, from small-cluster automotive SoCs to massive, multi-mesh data center architectures.

Evaluating the performance impact of different coherency protocols (e.g., CHI, ACE, or proprietary) and snooping filters.

Running and analyzing industry-standard benchmarks (SPEC, MLPerf, Automotive-specific suites) to drive architectural trade-offs.

Collaborating with build and verification teams to correlate performance models with silicon and working with software teams to optimize drivers for the underlying hardware topology.

Skills

Required

Master's or Ph.D. in Computer Engineering, Electrical Engineering, or Computer Science (or equivalent experience) with a focus on architecture with 5+ years of experience.
Strong understanding of CPU microarchitecture, memory consistency models, and cache coherency protocols.
Proven experience in C++ or SystemC for cycle-accurate or functional modeling.
Proficiency in Python or similar scripting languages for processing large datasets, generating performance visualizations, and automating simulation sweeps.
Understanding of Network-on-Chip (NoC) topologies (Mesh, Ring, Torus), credit-based flow control, and arbitration logic.

Nice to have

Practical experience managing the functional safety (ISO 26262) requirements of automotive chips alongside the power-performance-area (PPA) limitations of data center hardware.
Experience defining or using PMU (Performance Monitoring Unit) events to debug performance on real silicon or emulators.
A background in using formal verification or mathematical modeling to prove the correctness of complex coherency state machines.
A history of building your own internal tools or frameworks to accelerate architectural exploration rather than just using off-the-shelf simulators.
Knowledge of emerging memory technologies like CXL (Compute Express Link) or HBM (High Bandwidth Memory) and how they collaborate with coherent fabrics.

What the JD emphasized

high-fidelity, cycle-accurate performance models

performance bottlenecks

coherency protocols

industry-standard benchmarks

performance models with silicon

software teams to optimize drivers

functional safety (ISO 26262)

power-performance-area (PPA)

Hardware Performance Counters

PMU (Performance Monitoring Unit)

formal verification

mathematical modeling

Custom Tooling

Advanced Memory Systems

CXL (Compute Express Link)

HBM (High Bandwidth Memory)

We are looking for a highly skilled Performance Modeling Architect to lead the architectural definition and improvement of our next-generation CPU Cache Hierarchies and interconnects. This is an outstanding chance to create scalable solutions that connect two fast-paced domains: the high-reliability, low-latency needs of Automotive and the massive efficiency, high-density demands of Data Center systems. You will build the "source of truth" models that govern data movement across our silicon, ensuring our next-level caches (L3/System Cache) and coherent fabrics achieve ambitious performance goals.

What you'll be doing:

As a core member of the architecture team, your daily work will involve:

Developing and maintaining high-fidelity, cycle-accurate performance models (C++/SystemC) for coherent interconnects and large-scale shared caches.
Modeling and analyzing performance bottlenecks across varying scales, from small-cluster automotive SoCs to massive, multi-mesh data center architectures.
Evaluating the performance impact of different coherency protocols (e.g., CHI, ACE, or proprietary) and snooping filters.
Running and analyzing industry-standard benchmarks (SPEC, MLPerf, Automotive-specific suites) to drive architectural trade-offs.
Collaborating with build and verification teams to correlate performance models with silicon and working with software teams to optimize drivers for the underlying hardware topology.

What we need to see:

To be successful in this role, you should possess a deep technical foundation in computer architecture:

A Master’s or Ph.D. in Computer Engineering, Electrical Engineering, or Computer Science (or equivalent experience) with a focus on architecture with 5+ years of experience.
Strong understanding of CPU microarchitecture, memory consistency models, and cache coherency protocols.
Proven experience in C++ or SystemC for cycle-accurate or functional modeling.
Proficiency in Python or similar scripting languages for processing large datasets, generating performance visualizations, and automating simulation sweeps.
Understanding of Network-on-Chip (NoC) topologies (Mesh, Ring, Torus), credit-based flow control, and arbitration logic.

Ways to stand out from the crowd:

We are looking for individuals who bring a "systems-thinking" approach to hardware development. You will stand out if you have:

Cross-Domain Versatility: Practical experience managing the functional safety (ISO 26262) requirements of automotive chips alongside the power-performance-area (PPA) limitations of data center hardware.
Hardware Performance Counters: Experience defining or using PMU (Performance Monitoring Unit) events to debug performance on real silicon or emulators.
Formal Methods: A background in using formal verification or mathematical modeling to prove the correctness of complex coherency state machines.
Custom Tooling: A history of building your own internal tools or frameworks to accelerate architectural exploration rather than just using off-the-shelf simulators.
Advanced Memory Systems: Knowledge of emerging memory technologies like CXL (Compute Express Link) or HBM (High Bandwidth Memory) and how they collaborate with coherent fabrics.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until May 10, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.