AI Software System Design Engineer

AMD AMD · Semiconductors · Shanghai, China · Engineering

AI Software System Design Engineer at AMD responsible for developing, debugging, optimizing, and supporting machine learning end-to-end custom software solutions for AMD server GPUs. This involves deep expertise in ML kernel operators, programming languages like Triton/CUDA/PTX, and development libraries, with a focus on performance optimization for inference and training workloads. The role requires strong C++ and Python skills, hands-on experience with AI use cases, pipelines, frameworks, parallel programming, and debugging.

What you'd actually do

  1. Position technical proposals and support to top customers.
  2. Provide significant contribution to customer PoC success.
  3. Drive custom requirements for AI SW performance and stability, including from POC requirement to POR release, from GPU kernel to frameworks and distribution solutions.
  4. Collaborate and interact with different teams to analyze and optimize training and inference workloads from kernels, frameworks to solutions
  5. Analyze competitive solutions to identify strength and weakness for articulate value propositions.

Skills

Required

  • machine learning kernel operators
  • Triton/DSL, cuda/hip, PTX/ASM
  • cutlass/CK
  • C++
  • Python
  • parallel programming
  • debugging
  • performance analysis

Nice to have

  • compiler (Torch, Triton, LLVM, XLA HLO, graph)
  • Linux ROCm/CUDA runtime and KMD/UMD driver
  • AI distribution solutions
  • AI distributed network communication
  • Linux OS/driver, CI and toolchain (profiler/DCGM) development

What the JD emphasized

  • expert knowledge in machine learning areas such as kernel operators
  • hands-on experience with industry AI use scenarios and solutions
  • strong debugging and development skillsets
  • Excellent in GPU kernel primitive
  • Experiences with model inference optimization process

Other signals

  • customer-facing
  • performance optimization
  • GPU kernels
  • inference
  • training