AI Framework Engineer

AMD AMD · Semiconductors · Shanghai, China · Engineering

Software engineer at AMD focused on optimizing the performance of AI inference frameworks and applications on AMD hardware. This role involves developing and validating optimization features, working with LLM frameworks like vLLM and SGLang, and contributing to AI operations libraries and kernel implementations. The engineer will also perform performance analysis using profiling tools.

What you'd actually do

  1. Develop and drive execution of comprehensive, highly effective software for sophisticated new technology and new product introduction projects
  2. Provide quick and robust technical support for key customer projects
  3. Develop the optimization features and validate before releasing them to customers
  4. Contribute to a high-functioning feature team
  5. Collaborate closely with multiple teams to deliver key planning solutions and the technology to support them

Skills

Required

  • C
  • C++
  • Python
  • vLLM
  • SGLang
  • flashInfer
  • tensorRT-LLM
  • CUTLASS
  • Aiter
  • CUDA
  • Triton
  • Gluon kernels
  • roofline performance analysis
  • profiling tools
  • Nsight
  • ROCPdriver

Nice to have

  • AI agent tools

What the JD emphasized

  • improving the performance of key applications and benchmarks
  • Expert knowledge and hands-on experience in C, C++, python
  • Expert knowledge and hands-on experiecne in large lanaguage frameworks, including vllm and sglang
  • Familiar with AI operations libraries, like flashInfer, tensorRT-LLM, CUTLASS, Aiter(AMD) etc.
  • implementation of CUDA/Triton/Gluon kernels
  • Solid understanding of roofline performance analysis ability, on top of profiling tools for optimization analysis

Other signals

  • improving the performance of key applications and benchmarks
  • work with the very latest hardware and software technology
  • develop the optimization features and validate before releasing them to customers
  • familiar with AI operations libraries, like flashInfer, tensorRT-LLM, CUTLASS, Aiter(AMD) etc.
  • implementation of CUDA/Triton/Gluon kernels
  • roofline performance analysis ability, on top of profiling tools for optimization analysis