Research and Pathfinding Internship: AI Workload Compiler Optimization for Cpu and GPU

Intel Intel · Semiconductors · Gdansk, Poland

Internship role focused on advancing compiler infrastructure for heterogeneous AI workloads by developing novel optimization techniques for AI kernel compilation targeting both CPU and GPU architectures using MLIR/LLVM. Explores algebraic optimization, hierarchical scheduling, and cost-driven pruning for high-performance fused kernels.

What you'd actually do

  1. Develop novel optimization techniques for AI kernel compilation targeting both CPU (Intel AMX/AVX-512) and GPU architectures from a unified representation interfacing with MLIR/LLVM framework.
  2. Integrate hierarchical optimization abstractions with equality saturation techniques into an MLIR-based compilation pipeline.
  3. Enable automatic discovery and autotuning of high-performance fused kernels through exhaustive algebraic exploration combined with target-specific scheduling decisions.
  4. Explore the design and implementation of a PEG (Graph + PEG) abstraction that combines algebraic optimization, hierarchical scheduling, cost-driven and constraint pruning, MLIR integration, and verification.

Skills

Required

  • Compiler internals or programming languages (IR design, optimization passes)
  • Python
  • C++
  • Familiarity with CPU (cache hierarchies, SIMD/vector instructions)

Nice to have

  • MLIR/LLVM ecosystem
  • Theoretical foundation: Understanding of algebraic rewrite systems and/or e-graphs
  • Prior work with LLVM ecosystem, MLIR dialects or equality saturation frameworks (egg, eqsat)
  • Experience with autotuning or cost modeling for performance optimization
  • Knowledge of probabilistic algorithms and SMT solvers (Z3)
  • Familiarity with tensor compiler frameworks: Mirage, Halide, TVM, Triton, or similar
  • Publications or projects in compilers, or program synthesis
  • Experience with workload optimization for Intel architectures (AMX, AVX-512, Sycl)

What the JD emphasized

  • MLIR/LLVM framework
  • CPU (Intel AMX/AVX-512) and GPU architectures
  • algebraic optimization
  • hierarchical scheduling
  • cost-driven and constraint pruning
  • MLIR integration
  • Verification

Other signals

  • compiler optimization
  • AI workloads
  • MLIR/LLVM
  • CPU and GPU architectures