Senior Staff AI Software System Design Engineer

AMD AMD · Semiconductors · Shanghai, China · Engineering

Senior Staff AI Software System Design Engineer at AMD, focusing on custom development, debugging, optimization, and technical support of machine learning software for AMD server GPUs. The role involves working with AI frameworks, distribution, kernel operators, compilers, and runtimes, with a strong emphasis on performance optimization for inference and training workloads. Responsibilities include supporting customer Proofs of Concept (PoCs), driving custom AI software requirements from POC to release, and collaborating with various teams to optimize training and inference solutions.

What you'd actually do

  1. Position technical proposals and support to top customers.
  2. Provide significant contribution to customer PoC success.
  3. Drive custom requirements for AI SW, including from POC requirement to POR release, from GPU kernel to frameworks and distribution solutions.
  4. Collaborate and interact with different teams to analyze and optimize training and inference workloads and solutions
  5. Analyze competitive solutions to identify strength and weakness for articulate value propositions.

Skills

Required

  • C++
  • Python
  • debugging
  • performance analysis
  • Linux OS/driver
  • CI
  • toolchain (profiler/DCGM) development

Nice to have

  • vLLM
  • Sglang
  • Megatron-LM
  • Deepspeed
  • TensorRT
  • TensorRT-LLM
  • AI distribution solutions (EP/CP/TP/PP/DP, DeepEp, DualPipe, PD aggregation etc., KV cache transfer and storage)
  • AI distributed network communication with multi-GPU and multi-node collective communication primitives (NCCL/RCCL)
  • NIC/GPU drivers for RDMA/GDR and high-speed network
  • GPU kernel primitive like FA, PA, MOE, MLA integration and development with torch, triton, CUDA, CK, ASM etc.
  • parallel programming
  • CUDA C/C++
  • HIP
  • computer shader
  • model inference optimization process like gemm/convolution tuning, graph optimization and operator fusion
  • Linux DRM
  • HSA
  • ROCm KMD/UMD driver
  • compiler (triton/TVM)

What the JD emphasized

  • expert knowledge in machine learning areas such as frameworks (e.g. vLLM, Sglang, Megatron-LM, Deepspeed, TensorRT etc.)
  • strong programming skills in C++ and Python
  • hands-on experience with industry AI use scenarios, solutions, end-to-end pipelines, frameworks or SDKs
  • strong debugging and development skillsets
  • Excellent AI frameworks(e.g. vLLM, Sglang, Megatron-LM, Deepspeed, TensorRT, TensorRT-LLM)
  • Excellent programming skills in Python, C++and software skills, including debugging and performance analysis
  • Experience with AI distribution solutions
  • Experiences with AI distributed network communication with multi-GPU and multi-node collective communication primitives
  • Experiences with GPU kernel primitive like FA, PA, MOE, MLA integration and development with torch, triton, CUDA, CK, ASM etc.
  • Knowledge of parallel programming, ideally CUDA C/C++, HIP or computer shader.
  • Knowledge of model inference optimization process like gemm/convolution tuning, graph optimization and operator fusion.
  • Knowledge of Linux OS/driver, CI and toolchain (profiler/DCGM) development and debugging.

Other signals

  • optimization for inference or training
  • customer PoC success
  • GPU kernel to frameworks and distribution solutions
  • Analyze competitive solutions