AI Frameworks Engineer – GPU Performance for Generative AI (openvino)

Intel Intel · Semiconductors · Seoul, South Korea

Software engineer focused on implementing and optimizing generative AI workloads (LLMs, diffusion models) on Intel GPUs using the OpenVINO inference runtime. The role involves analyzing performance bottlenecks, adapting state-of-the-art techniques, and optimizing for current and future GPU architectures, requiring deep C++ and system-level expertise.

What you'd actually do

  1. Take technical ownership of performance-critical paths for generative AI workloads (e.g., LLMs, diffusion models) on Intel GPUs
  2. Analyze end-to-end execution of AI models to identify compute, memory, bandwidth, and parallelism bottlenecks
  3. Implement and optimize generative AI techniques, adapting state-of-the-art ideas to efficiently run on Intel GPU architectures
  4. Translate deep understanding of GPU hardware architecture into efficient, scalable, and maintainable software designs
  5. Optimize workloads for both current and future Intel GPU platforms, including hardware that is still under development

Skills

Required

  • 3+ years of professional software engineering experience
  • Strong programming skills in C and C++
  • working experience with Python
  • Experience working with large and complex C++ codebases
  • Proven analytical thinking and strong problem-solving abilities

Nice to have

  • Experience with GPU programming or parallel computing
  • Strong understanding of computer and GPU architecture
  • Technical understanding of generative AI models from a system and performance perspective
  • Familiarity with AI runtimes or frameworks
  • Solid foundation in computer science fundamentals
  • Ability to communicate technical ideas clearly

What the JD emphasized

  • performance-critical paths
  • performance optimization
  • performance
  • performance
  • performance

Other signals

  • performance optimization
  • generative AI workloads
  • Intel GPUs
  • OpenVINO
  • inference runtime