Senior Deep Learning Performance Architect

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1

Senior Deep Learning Performance Architect at NVIDIA to design and evaluate hardware architectures for AI/HPC applications, focusing on LLM inference and training performance, and optimizing system bottlenecks.

What you'd actually do

  1. Design and evaluate hardware architectures to improve performance, efficiency, and scalability of production AI workloads.
  2. Analyze and optimize large-scale deep learning workloads, especially LLM inference/training in real-world deployments.
  3. Build and use performance and power models (Python/C++) to drive architecture and product decisions.
  4. Identify and resolve system bottlenecks across compute, memory, and interconnect.
  5. Evaluate PPA trade-offs and guide feature prioritization for next-generation GPU/ASIC designs.

Skills

Required

  • GPU/ASIC architecture
  • parallel computing
  • system performance engineering
  • deep learning workloads
  • Python
  • C++
  • system architecture
  • memory hierarchy
  • data movement
  • scalability
  • debugging
  • profiling
  • performance tuning

Nice to have

  • LLM inference optimization
  • batching
  • disaggregation
  • KV-cache management
  • latency/throughput tuning
  • production inference systems
  • scheduling
  • multi-node scaling
  • resource utilization

What the JD emphasized

  • production AI workloads
  • LLM inference/training
  • performance models
  • system architecture
  • deep learning workloads in production environments
  • production inference systems

Other signals

  • performance optimization
  • hardware architecture
  • LLM inference/training