Senior Deep Learning Performance Architect

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1

NVIDIA is seeking a Senior Deep Learning Performance Architect to analyze and develop next-generation architectures for AI and HPC applications. The role involves developing innovative architectures, analyzing performance/cost/power trade-offs using models and simulators, understanding hardware/software interplay, and evaluating PPA for architectural decisions. Collaboration with software, product, and research teams is key. Requires MS/PhD, 6+ years experience, strong background in GPU/Deep Learning ASIC architecture for distributed training/inference, performance modeling, and ML/DL fundamentals, particularly transformer architectures. Proficiency in Python, C, C++ is essential.

What you'd actually do

  1. Develop innovative architectures to extend the state of the art in deep learning performance and efficiency
  2. Analyze performance, cost and power trade-offs by developing analytical models, simulators and test suites
  3. Understand and analyze the interplay of hardware and software architectures on future algorithms, programming models and applications
  4. Evaluate PPA (performance, power, area) for hardware features and system level architectural trade-offs. Develop high level simulators in C++/Python
  5. Actively collaborate with software, product and research teams to guide the direction of deep learning HW and SW

Skills

Required

  • MS or PhD in Computer Science, Computer Engineering, Electrical Engineering or equivalent experience
  • 6+ years of relevant meaningful work experience
  • Strong background in GPU or Deep Learning ASIC architecture for distributed training and/or inference spanning multi-chip/multi-node
  • Experience with performance modeling, architecture simulation, profiling, and analysis
  • Solid foundation in machine learning and deep learning. Understanding of modern transformer-based architectures and their performance at scale.
  • Strong programming skills in Python, C, C++

Nice to have

  • Background with deep neural network training, inference and optimization in leading frameworks (e.g. Pytorch, JAX, TensorRT)
  • Familiarity with advanced optimizations and SW/HW co-design in LLM training and inference
  • Exposure to using AI to accelerate SW engineering
  • Demonstration of self-motivation and creative / critical thinking

What the JD emphasized

  • performance analysis
  • performance modeling
  • deep learning performance
  • distributed training and/or inference
  • performance modeling
  • architecture simulation
  • profiling
  • analysis
  • deep neural network training
  • inference
  • optimization

Other signals

  • architectures that accelerate AI
  • deep learning performance and efficiency
  • performance analysis, performance modeling
  • GPU or Deep Learning ASIC architecture
  • distributed training and/or inference