Senior Deep Learning Kernel Software Performance Architect

NVIDIA · Semiconductors · Santa Clara, CA

Senior Kernel Performance Architect for Deep Learning Software at NVIDIA, focusing on crafting and prototyping GPU-accelerated system architectures to optimize deep learning and data analytics workloads. Requires expertise in kernel performance, math libraries, GPU computing, and parallel programming.

What you'd actually do

  1. Craft GPU-accelerated system architectures that push the boundaries of deep learning performance.
  2. Prototype high-performance software for deep learning and data analytics workloads.
  3. Analyze, visualize, and optimize software performance using analytical models, simulators, and test suites.
  4. Collaborate closely across NVIDIA teams such as: - CUDA Compiler teams to identify performance issues. - AI/ML training and inference performance teams to identify and optimize critical deep learning layers. - hardware architecture performance teams to define expectation for emerging deep learning hardware features.

Skills

Required

  • Master's or PhD in Computer Science, Electrical Engineering or Computer Engineering, or equivalent experience
  • 5+ years of relevant industry or research experience
  • machine learning and deep learning fundamentals
  • computer architecture
  • high performance kernel (such as CUTLASS)
  • math library performance analysis and profiling
  • Python
  • C
  • C++
  • GPU computing
  • parallel programming models
  • analytical performance modeling
  • profiling
  • analysis

What the JD emphasized

  • 5+ years of relevant industry or research experience
  • strong foundation in machine learning and deep learning fundamentals
  • strong background in high performance kernel
  • work experience on math library performance analysis and profiling
  • firsthand work experience with analytical performance modeling, profiling, and analysis

Other signals

  • GPU-accelerated system architectures
  • deep learning performance
  • high-performance kernel
  • math library performance analysis
  • GPU computing
  • parallel programming models