Senior Deep Learning Performance Architect

NVIDIA · Semiconductors · Santa Clara, CA +2

NVIDIA is seeking a Senior Deep Learning Performance Architect to analyze and develop next-generation architectures for AI and high-performance computing. Responsibilities include developing HW architectures for performance and energy efficiency, benchmarking AI workloads, creating simulation tools, and evaluating hardware features. Requires MS/PhD or equivalent experience with 4+ years in parallel computing architectures, GPU/ASIC architecture evaluation for training/inference, and strong Python/C++ skills.

What you'd actually do

  1. Develop innovative HW architectures to extend the state of the art in parallel computing performance, energy efficiency and programmability.
  2. Benchmark and analyze AI workloads in single and multi-node configurations.
  3. Develop high level simulator and analysis tools in C++/Python.
  4. Evaluate PPA (performance, power, area) for hardware features and system-level architectural trade-offs.
  5. Work closely with peer architecture teams and product management to guide development of the products.

Skills

Required

  • MS or PhD in a relevant discipline (Computer Science, Electrical Engineering, Computer Engineering, etc) or equivalent experience
  • 4+ years of experience in parallel computing architectures, interconnect fabrics and deep learning applications
  • Background in GPU or Deep Learning ASIC architecture evaluation for training and/or inference
  • Strong programming skills in Python and C++

Nice to have

  • Solid fundamental knowledge in computer architecture and interconnect fabrics
  • Understanding of modern transformer-based model architectures
  • Ability to simplify and communicate rich technical concepts to non-technical audience
  • Have a curious demeanor with excellent problem-solving skills

What the JD emphasized

  • GPU or Deep Learning ASIC architecture evaluation for training and/or inference
  • 4+ years of experience in parallel computing architectures, interconnect fabrics and deep learning applications

Other signals

  • GPU Deep Learning architecture evaluation for training and/or inference
  • Benchmark and analyze AI workloads
  • Develop innovative HW architectures to extend the state of the art in parallel computing performance