Workload Optimization Intern

Intel Intel · Semiconductors · Shanghai, China

This intern role focuses on optimizing deep learning models and their deployment for Intel GPUs/CPUs. Responsibilities include performance tuning, debugging accuracy and memory issues, developing deployment frameworks (e.g., using vLLM), and creating high-performance kernels. The role involves technical syncs with architects and transforming innovative ideas into production-ready features.

What you'd actually do

  1. Optimizing key use cases and models, as well as debugging and resolving issues related to accuracy and memory management.
  2. Designing and developing model deployment frameworks, such as leveraging new features in vLLM to accelerate inference.
  3. Developing and debugging high-performance Kernels specifically for INTEL GPU/CPU.
  4. Engaging in deep technical syncs with architects and peers to iterate on solutions and provide progress transparency.
  5. Transforming innovative ideas into production-ready features.

Skills

Required

  • C++
  • Python
  • Deep Learning fundamentals
  • practical experience with Deep Learning

Nice to have

  • LLMs
  • Multimodal models
  • Agents
  • PyTorch
  • vLLM
  • GPU Kernel development
  • CUDA
  • Triton

Other signals

  • Optimizing key use cases and models
  • Debugging and resolving issues related to accuracy and memory management
  • Designing and developing model deployment frameworks
  • Leveraging new features in vLLM to accelerate inference
  • Developing and debugging high-performance Kernels specifically for INTEL GPU/CPU