Senior Deep Learning Performance Architect - Lpu

NVIDIA NVIDIA · Semiconductors · CA +1 · Remote

NVIDIA is seeking a Senior Deep Learning Performance Architect to focus on hardware-software co-design for AI Inference performance. The role involves designing GPU and system architectures, analyzing deep learning algorithms, building performance models, and collaborating with various teams to guide AI direction.

What you'd actually do

  1. Design novel GPU and system architectures to advance the forefront of AI Inference performance and efficiency
  2. Construct, investigate, and test popular deep learning algorithms and applications
  3. Understand and analyze the relationship between hardware and software architectures as it influences future algorithms and applications
  4. Build efficient power and performance models of AI inference stack, while capturing minimal but significant information to guide next-gen HW architecture
  5. Collaborate across the company to guide the direction of AI, working with software, research, and product teams

Skills

Required

  • MS or PhD in a relevant field (CS, EE, Math) or equivalent experience
  • 5+ years of relevant experience
  • Strong mathematical foundation in machine learning and deep learning
  • Expert programming skills in C, C++, and/or Python
  • Familiarity with GPU computing (CUDA or similar) and HPC (MPI, OpenMP) stack
  • Strong knowledge and coursework in computer architecture

Nice to have

  • Systems-level performance modeling, profiling, and analysis
  • Characterizing and modeling system-level performance, accomplishing comparison studies, and documenting and publishing results
  • Improving AI Inference workloads by developing CUDA kernels or compilers for custom ASIC hardware

What the JD emphasized

  • AI Inference performance
  • modeling LLM performance
  • architecting AI systems
  • optimizing every cycle
  • AI Inference workloads

Other signals

  • performance optimization
  • hardware-software co-design
  • AI inference