Senior Deep Learning Computer Architect

NVIDIA · Semiconductors · Santa Clara, CA +1

NVIDIA is seeking a Senior Deep Learning Computer Architect to design hardware accelerator and processor architectures for next-generation platforms, enabling state-of-the-art machine learning and data analytics algorithms. The role involves analyzing deep learning methods, proposing new features for acceleration, and studying their benefits, with a focus on LLM workloads and core deep learning kernels.

What you'd actually do

  1. As a member of our deep learning architecture team, you will contribute to features that help next-generation GPUs advance the state of AI.
  2. This position requires you to keep up with the latest DL research and collaborate with diverse teams (internal and external to NVIDIA), including DL researchers, hardware architects, and software engineers.
  3. Your day to day work will include analyzing the behavior of various deep learning methods, proposing new features to accelerate or enable various methods, and studying the benefits of the proposed features.

Skills

Required

  • MS or PhD degree in computer science, computer architecture, electrical engineering or related field or equivalent experience
  • 5+ years of relevant experience in computer architecture, including GPU and system level architecture
  • Performance analysis and optimization
  • Experience with LLM workloads, including performance tuning considerations such as parallelization and fusion strategies
  • Experience with core deep learning kernels such as matrix multiply, attention, and communication convolution
  • Programming fluency with C++
  • Experience with GPU computing (CUDA)
  • Experience with deep learning frameworks like PyTorch

Nice to have

  • Python

What the JD emphasized

  • 5+ years of relevant experience in at least a few of the following relevant areas is required in your work history
  • Experience with LLM workloads
  • Experience with core deep learning kernels such as matrix multiply, attention, and communication convolution

Other signals

  • design hardware accelerator and processor architectures
  • enable state of the art machine learning and data analytics algorithms
  • next-generation mobile, embedded and datacenter platforms
  • deep learning architecture team
  • advance the state of AI
  • keep up with the latest DL research
  • analyzing the behavior of various deep learning methods
  • proposing new features to accelerate or enable various methods
  • studying the benefits of the proposed features
  • Experience with LLM workloads
  • performance tuning considerations such as parallelization and fusion strategies
  • Experience with core deep learning kernels such as matrix multiply, attention, and communication convolution