Fellow, AI Performance Software Engineer

AMD AMD · Semiconductors · Santa Clara, CA · Engineering

AMD is seeking an AI Performance Software Engineer to optimize AI software and libraries for their Instinct GPUs, focusing on inference and fine-tuning performance in datacenter and cloud environments. The role involves deep analysis of hardware bottlenecks and software performance, requiring strong C++ and Python skills within major DL frameworks.

What you'd actually do

  1. enable DL models, libraries, and applications for Instinct GPUs in both on-prem and Cloud environments.
  2. analyzing and optimizing the performance of AI software and understand hardware bottlenecks and harness performance to hit close to roofline.
  3. work with a team of Software Engineers to enable DL models, libraries, and applications for Instinct GPUs in both on-prem and Cloud environments.
  4. work with the industry’s most sophisticated clients to help them leverage the latest hardware capabilities for their AI use cases.
  5. be among the first in the world to combine the newest hardware with the industry’s latest applications, libraries, frameworks, and SDKs to push the limits of innovation and solve the world’s most complex challenges.

Skills

Required

  • C++
  • Python
  • DL frameworks (PyTorch, TensorFlow)
  • Inference optimization
  • Fine-tuning optimization
  • Training optimization
  • Performance analysis
  • GPU architecture
  • Software optimization

Nice to have

  • GPU accelerators (NCCL/RCCL, OpenMP, MPI)
  • Profiling tools (Torchprofiler, RocM profiler, Vtune, Nsight)
  • CPU performance analysis
  • Singularity
  • Docker
  • Kubernetes
  • Open-source software development
  • Publications in ML conferences/journals

What the JD emphasized

  • Minimum 4 years of experience required
  • Strong programming skills in C++ and Python
  • Strong development experience is at least one major DL framework such as Pytorch or Tensorflow in inference, fine tuning and/or training

Other signals

  • enabling software for world class datacenters
  • optimize the software ecosystem for the next generation of GPU computational accelerators
  • enable DL models, libraries, and applications for Instinct GPUs
  • analyzing and optimizing the performance of AI software
  • understand hardware bottlenecks and harness performance to hit close to roofline