Deep Learning Performance Architect

NVIDIA NVIDIA · Semiconductors · Shanghai, China

NVIDIA is seeking a Deep Learning Performance Architect to optimize deep learning hardware and software architectures for edge devices, workstations, and data center GPUs. The role involves benchmarking, performance modeling, bottleneck identification, and exploring new hardware/software capabilities, with a focus on LLMs and generative AI. Experience with AI agents for engineering workflows is also mentioned.

What you'd actually do

  1. Benchmark and analyze performance of various machine learning/deep learning workloads across GPU- and NPU-based architectures
  2. Build and validate performance models, and deliver performance projections and insights for deep learning (LLM/GenAI) workloads on emerging architectures
  3. Identify architecture, software and system performance bottlenecks and propose actionable optimizations
  4. Explore and evaluate new software/hardware capabilities and translate them into measureable application gains
  5. Leverage AI agents to accelerate performance investigation and engineering workflows

Skills

Required

  • BSc. MS or PhD in relevant discipline (CS, EE, Math, etc.,)
  • Familiar with GPU or Accelerator-based deep learning platform and software stack
  • A strong background in computer architecture
  • Familiar with LLM or generative AI deep learning algorithms and kernel optimizations
  • Experience in system architecture design and performance optimization
  • Familiar with machine learning and deep learning frameworks

Nice to have

  • 3+ years of working experience in relevant directions will be a plus
  • Hands-on experience using AI agents to assist daily engineering work

What the JD emphasized

  • deep learning
  • performance optimization
  • LLM
  • generative AI
  • AI agents

Other signals

  • Performance optimization
  • Deep learning workloads
  • GPU/NPU architectures
  • LLM/GenAI
  • AI agents