Deep Learning Performance Architect

NVIDIA NVIDIA · Semiconductors · Shanghai, China +1

NVIDIA is seeking a Deep Learning Performance Architect to analyze, model, and optimize deep learning system performance, particularly for LLM workloads, on state-of-the-art hardware architectures. This role influences future hardware and software design by collaborating with various internal teams.

What you'd actually do

  1. Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products
  2. Develop analytical models for the state of the art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency.
  3. Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uni-processor and multiprocessor configurations.
  4. Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.

Skills

Required

  • BS, MS or PhD in relevant discipline (CS, EE, Math, etc.) or equivalent experience
  • 5+ years work experience
  • Experience with popular AI models (e.g., LLM and AIGC models)
  • Be familiar with typical deep learning SW framework (e.g., Torch/JAX/TensorFlow/TensorRT)
  • Knowledge and experience on hardware architectures for deep learning applications

Nice to have

  • performance optimization
  • analytical modeling
  • LLM workloads

What the JD emphasized

  • 5+ years work experience

Other signals

  • performance optimization
  • hardware architecture
  • LLM workloads