AI GPU Arch Perf Optimization Intern

Intel Intel · Semiconductors · Shanghai, China +1

This internship focuses on optimizing core GPU compute kernels for AI and numerical workloads, validating GPU IP with AI inference and training workloads, and performing GPU performance profiling and analysis. The role involves hardware/software codesign for next-generation Intel GPU and AI accelerator platforms.

What you'd actually do

  1. Analyze and optimize core GPU compute kernels for AI and numerical workloads (e.g., GEMM, Attention, operator fusion).
  2. Reproduce representative AI inference and training workloads for GPU IP validation.
  3. Perform GPU performance profiling and analysis to identify compute, memory, and pipeline bottlenecks.
  4. Build performance profiles and models to understand architecture level performance behavior.
  5. Provide workload and kernel level insights to support GPU architecture design and HW/SW codesign efforts.

Skills

Required

  • Proficiency in Python for analysis, experimentation, or tooling.
  • Solid understanding of AI fundamentals, including common models and algorithms.
  • Strong interest in GPU architecture, GPU programming, parallel computing, and performance optimization.
  • Basic knowledge of computer systems, such as CPU/GPU architecture, memory systems, and performance analysis.

Nice to have

  • Experience with GPU kernels or programming models (e.g., CUDA, OpenCL, SYCL, Triton).
  • Exposure to performance optimization, compiler, or parallel computing coursework, research, or internships.
  • Strong analytical and problem solving skills, with the ability to reason from profiling data.
  • Interest in AI systems and infrastructure, beyond model level development.
  • Ability to work effectively in a collaborative, cross functional engineering environment.

What the JD emphasized

  • core GPU kernel optimization
  • AI workloads
  • GPU IP validation
  • performance optimization
  • GPU architecture

Other signals

  • GPU kernel optimization
  • AI workloads
  • performance optimization
  • GPU IP validation