Senior Performance Architect - Heterogeneous Workload Optimization

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +3

Senior Systems Performance Engineer to build next-generation profiling infrastructure for optimizing heterogeneous workloads (CPU and GPU) in EDA applications, focusing on memory access patterns, cache utilization, and GPU kernel performance.

What you'd actually do

  1. Architecting and maintaining custom profiling frameworks that provide a unified view of execution across CPU (multi-core/multi-socket) and GPU (multi-node/NVLink) environments.
  2. Conducting deep-dive benchmarking of EDA applications to characterize memory access patterns, cache hit rates, and instruction-level parallelism.
  3. Using GPU profilers to detect GPU-side inefficiencies such as warp divergence, sub-optimal occupancy, and PCIe/NVLink bottlenecks.
  4. Developing tools to monitor and attribute high-watermark memory usage in multi-terabyte EDA builds, finding opportunities for data structure compression or smarter memory pooling.
  5. Developing predictive models to guide hardware procurement and cloud instance selection based on built gate-count and algorithmic complexity.

Skills

Required

  • CUDA programming model
  • GPU profiling tools (NVIDIA Nsight Systems/Compute)
  • profiling tools (perf, eBPF, VTune, Valgrind)
  • distributed compute environments (Slurm, LSF, or Kubernetes)
  • systems-level performance analysis

Nice to have

  • BS, MS, or PhD in Computer Science, Electrical Engineering, or a related field (or equivalent experience)

What the JD emphasized

  • extensive knowledge of profiling tools
  • experience with distributed compute environments
  • more than 8+yrs of relevent experience and at least 5 years involved in systems-level performance analysis