Deep Learning Performance Architect

NVIDIA · Semiconductors · Shanghai, China

NVIDIA is seeking a Deep Learning Performance Architect to optimize deep learning hardware and software architectures for edge devices, workstations, and data center GPUs. The role involves benchmarking, performance modeling, bottleneck identification, and exploring new hardware/software capabilities, with a focus on LLMs and generative AI. Experience with AI agents for engineering workflows is also mentioned.

What you'd actually do

Benchmark and analyze performance of various machine learning/deep learning workloads across GPU- and NPU-based architectures
Build and validate performance models, and deliver performance projections and insights for deep learning (LLM/GenAI) workloads on emerging architectures
Identify architecture, software and system performance bottlenecks and propose actionable optimizations
Explore and evaluate new software/hardware capabilities and translate them into measureable application gains
Leverage AI agents to accelerate performance investigation and engineering workflows

Skills

Required

BSc. MS or PhD in relevant discipline (CS, EE, Math, etc.,)
Familiar with GPU or Accelerator-based deep learning platform and software stack
A strong background in computer architecture
Familiar with LLM or generative AI deep learning algorithms and kernel optimizations
Experience in system architecture design and performance optimization
Familiar with machine learning and deep learning frameworks

Nice to have

3+ years of working experience in relevant directions will be a plus
Hands-on experience using AI agents to assist daily engineering work

What the JD emphasized

deep learning
performance optimization
LLM
generative AI
AI agents

Other signals

Performance optimization
Deep learning workloads
GPU/NPU architectures
LLM/GenAI
AI agents

Read full job description

NVIDIA is developing processor and system architectures that accelerate deep learning on edge devices, workstations, and data center GPUs for a variety of applications including automotive, robotics, large language models and AI generative models. We are looking for an expert deep learning system performance architect to join our deep learning modelling, performance optimization, projections, and analysis effort. In this position, you will have the chance to optimize deep learning hardware and software architecture and make the significant impact in a dynamic technology focused company

What you’ll be doing:

Benchmark and analyze performance of various machine learning/deep learning workloads across GPU- and NPU-based architectures
Build and validate performance models, and deliver performance projections and insights for deep learning (LLM/GenAI) workloads on emerging architectures
Identify architecture, software and system performance bottlenecks and propose actionable optimizations
Explore and evaluate new software/hardware capabilities and translate them into measureable application gains
Leverage AI agents to accelerate performance investigation and engineering workflows

What we need to see:

BSc. MS or PhD in relevant discipline (CS, EE, Math, etc.,)
3+ years of working experience in relevant directions will be a plus
Familiar with GPU or Accelerator-based deep learning platform and software stack
A strong background in computer architecture
Familiar with LLM or generative AI deep learning algorithms and kernel optimizations
Experience in system architecture design and performance optimization
Familiar with machine learning and deep learning frameworks
Hands-on experience using AI agents to assist daily engineering work