GPU Kernel Development Engineer

AMD AMD · Semiconductors · Shanghai, China · Engineering

GPU Kernel Development Engineer at AMD focused on optimizing deep learning frameworks (TensorFlow, PyTorch) and GPU kernels for AMD GPUs to improve training and inference performance on multi-GPU and multi-node systems. This involves low-level programming, compiler technologies, and collaboration with internal and open-source teams.

What you'd actually do

  1. Optimize Deep Learning Frameworks: Enhance and optimize frameworks like TensorFlow and PyTorch for AMD GPUs in open-source repositories.
  2. Develop GPU Kernels: Create and optimize GPU kernels to maximize performance for specific AI operations.
  3. Develop & Optimize Models: Design and optimize deep learning models specifically for AMD GPU performance.
  4. Collaborate with GPU Library Teams: Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs.
  5. Collaborate with Open-Source Maintainers: Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream.

Skills

Required

  • C++ development
  • Linux environments
  • Python
  • deep learning frameworks (TensorFlow, PyTorch)
  • GPU kernel development
  • performance optimization
  • distributed computing environments
  • compiler technologies

Nice to have

  • HIP
  • CUDA
  • assembly (ASM)
  • AMD architectures (GCN, RDNA)
  • low-level programming
  • Compute Kernel (CK)
  • CUTLASS
  • Triton
  • multi-GPU and multi-platform performance
  • scaling and throughput
  • debugging
  • performance tuning
  • test design
  • heterogeneous compute clusters
  • compiler theory
  • LLVM
  • ROCm
  • graph compilers

What the JD emphasized

  • strong experience will be critical
  • strong technical and analytical expertise
  • strong problem-solving skills
  • strong experience in designing and optimizing GPU kernels
  • Strong knowledge of AMD architectures
  • Strong experience in integrating optimized GPU performance
  • Expert skills in Python and C++
  • Strong experience in running large-scale workloads

Other signals

  • optimizing deep learning frameworks
  • enhancing GPU kernels
  • deep learning models
  • training/inference performance
  • multi-GPU and multi-node systems