Senior Cpu Workloads and Simulation Architect

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +3

This role focuses on researching, architecting, implementing, and evaluating mechanisms for capturing and studying complex applications for CPU architectural and microarchitectural analysis in simulation. It involves developing tools for trace analysis and replay, and contributing to functional and performance models of ARM-based systems, with a goal to support growth in AI, deep learning, HPC, gaming, VR, and autonomous vehicles.

What you'd actually do

  1. Research, architect, implement, and evaluate mechanisms for capturing and studying complex applications suitable for architectural and microarchitectural CPU analysis in simulation. This includes multi-core, multi-thread, and heterogenous workloads spanning CPU/GPU/NIC, simulated at the user-level, VM-level, and full-system level.
  2. Implement tools, processes, and systems for collecting traces and checkpoints for complex multi-threaded heterogeneous applications and support other architects in using those tools to study workloads.
  3. Contribute to developing functional and performance models of ARM-based systems. Focus on infrastructure for recording and replaying workload sequences for performance and power analysis.
  4. Stay on top of guidelines in industry and academia relating to simulation, checkpointing, tracing, deterministic replay, and architectural/microarchitectural analysis of complex heterogeneous computer systems.

Skills

Required

  • BS/MS in EE, CE, or CS or equivalent experience
  • 12 or more years of relevant experience
  • Experience with CPU workload methodology: state capture and replay, trace analysis, SimPoint, etc.
  • Knowledge of CPU and system architecture and microarchitecture
  • Strong C/C++ and Python programming skills
  • Excellent communication and collaboration skills

Nice to have

  • Strong knowledge in sampling methodology and data science
  • Experience with CPU/GPU application development and optimization in Pytorch, TensorFlow, and similar frameworks
  • Proficiency in the ARM instruction set architecture
  • Experience developing user-mode and/or kernel-mode drivers
  • Background in writing functional and/or performance simulators

What the JD emphasized

  • 12 or more years of relevant experience
  • Experience with CPU workload methodology: state capture and replay, trace analysis, SimPoint, etc.
  • Strong C/C++ and Python programming skills