AI GPU Arch Perf Optimization Intern

Intel Intel · Semiconductors · Shanghai, China +1

Intern role focused on optimizing GPU compute kernels for AI workloads and validating GPU IP. Involves performance profiling, analysis, and modeling to improve next-generation Intel GPU and AI accelerator platforms.

What you'd actually do

  1. Analyze and optimize core GPU compute kernels for AI and numerical workloads (e.g., GEMM, Attention, operator fusion).
  2. Reproduce representative AI inference and training workloads for GPU IP validation.
  3. Perform GPU performance profiling and analysis to identify compute, memory, and pipeline bottlenecks.
  4. Build performance profiles and models to understand architecture level performance behavior.
  5. Provide workload and kernel-level insights to support GPU architecture design and HW/SW codesign efforts.

Skills

Required

  • Python
  • AI fundamentals
  • GPU architecture
  • GPU programming
  • parallel computing
  • performance optimization
  • computer systems
  • CPU/GPU architecture
  • memory systems
  • performance analysis

Nice to have

  • CUDA
  • OpenCL
  • SYCL
  • Triton
  • performance optimization
  • compiler
  • parallel computing
  • analytical and problem solving skills
  • AI systems and infrastructure

Other signals

  • GPU kernel optimization
  • AI workloads
  • GPU IP validation
  • performance optimization