Senior Staff Software Development Engineer

AMD · Semiconductors · Shanghai, China · Engineering

Senior Staff Software Development Engineer at AMD focused on optimizing and developing deep learning frameworks for AMD GPUs. The role involves enhancing GPU kernels, deep learning models, and training/inference performance across multi-GPU and multi-node systems, collaborating with internal library teams and open-source maintainers. Key responsibilities include end-to-end optimization of distributed inference and RL solutions, optimizing performance on distributed computing environments, and leveraging compiler technologies. Requires strong C++ development in Linux, GPU kernel development, deep learning integration, and high-performance computing experience.

What you'd actually do

  1. Build and optimize end to end distributed inference (e.g, P/D disaggregation and Large-EP) and RL solutions on mainstream frameworks like vLLM and SGlang.
  2. Work closely with internal teams to analyze and improve training and inference performance on AMD GPUs.
  3. Engage with framework maintainers to ensure code changes are aligned with requirements and integrated upstream.
  4. Optimize deep learning performance on both scale-up (multi-GPU) and scale-out (multi-node) systems.
  5. Leverage advanced compiler technologies to improve deep learning performance.

Skills

Required

  • C++ development
  • Linux environments
  • Python
  • software engineering best practices
  • deep learning frameworks
  • distributed computing
  • compiler technologies

Nice to have

  • GPU kernel development
  • HIP
  • CUDA
  • assembly (ASM)
  • AMD architectures (GCN, RDNA)
  • low-level programming
  • Compute Kernel (CK)
  • CUTLASS
  • Triton
  • vLLM
  • SGlang
  • TensorFlow
  • PyTorch
  • LLM
  • multimodal
  • Text to Video
  • Image to Video
  • debugging
  • performance tuning
  • test design
  • High-Performance Computing
  • heterogeneous computing clusters
  • compiler theory
  • LLVM
  • ROCm
  • Master’s or PhD in Computer Science, Computer Engineering, Electrical Engineering, or related fields
  • 5+ years of professional experience

What the JD emphasized

  • Deep experienced in designing and optimizing GPU kernels for deep learning on AMD GPUs using HIP, CUDA, and assembly (ASM).
  • Strong knowledge of AMD architectures (GCN, RDNA) and low-level programming to maximize performance for AI operations, leveraging tools like Compute Kernel (CK), CUTLASS, and Triton for multi-GPU and multi-platform performance.
  • Strong experienced in integrating optimized GPU performance into machine learning and LLM frameworks (e.g., vLLM, SGlang,TensorFlow, PyTorch) to accelerate model training and inference, with a focus on scaling and throughput.
  • solid hands-on E2E performance tuning experience on distributed inference (e.g, P/D disaggregation and Large-EP) and RL.
  • Expert experienced in running large-scale workloads on heterogeneous computing clusters, optimizing efficiency and scalability.

Other signals

  • optimizing deep learning frameworks for AMD GPUs
  • enhancing GPU kernels, deep learning models, and training/inference performance
  • optimizing end to end distributed inference and RL solutions
  • optimizing deep learning performance on scale-up and scale-out systems