AI GPU Arch Perf Optimization Intern

Intel Intel · Semiconductors · Shanghai, China +1

Intern role focused on optimizing GPU compute kernels for AI workloads and validating GPU IP. Involves performance profiling, analysis, and building performance models to understand architecture-level behavior, contributing to hardware/software codesign for next-generation Intel GPUs and AI accelerators.

What you'd actually do

  1. Analyze and optimize core GPU compute kernels for AI and numerical workloads (e.g., GEMM, Attention, operator fusion).
  2. Reproduce representative AI inference and training workloads for GPU IP validation.
  3. Perform GPU performance profiling and analysis to identify compute, memory, and pipeline bottlenecks.
  4. Build performance profiles and models to understand architecture-level performance behavior.
  5. Provide workload and kernellevel insights to support GPU architecture design and HW/SW codesign efforts.

Skills

Required

  • Python
  • AI fundamentals
  • GPU architecture
  • GPU programming
  • parallel computing
  • performance optimization
  • computer systems understanding

Nice to have

  • CUDA
  • OpenCL
  • SYCL
  • Triton
  • performance optimization coursework
  • compiler coursework
  • parallel computing coursework
  • analytical skills
  • problem-solving skills
  • AI systems and infrastructure interest

Other signals

  • GPU kernel optimization
  • AI workloads
  • performance profiling
  • GPU architecture