AI Frameworks Software Engineer – Model Compression Algorithm

Intel Intel · Semiconductors · Shanghai, China

Develop Intel Neural Compressor product and related tools, optimize for Intel AI platform (CPU, GPU, AI Accelerator). Research and implement quantization and compression techniques for LLMs and text-to-image/video generation models. Track and explore cutting-edge directions in efficient model deployment and inference/finetuning acceleration.

What you'd actually do

  1. Develop Intel Neural Compressor product and related tools (auto-round), optimize for Intel AI platform, including CPU, GPU and AI Accelerator
  2. Research and implement quantization and compression techniques for large language models (LLMs) and text-to-image/video generation models
  3. Track and explore cutting-edge directions in efficient model deployment and inference/finetuning acceleration.

Skills

Required

  • Master’s or PHD’s degree, major in computer science or related subjects
  • Solid understanding of deep learning, deep learning framework and large language model (LLM) fundamentals
  • Familiarity with model compression techniques such as quantization and pruning
  • Proficiency in Python/C++ or other programming languages commonly used for deep learning development
  • Strong sense of teamwork and group collaboration
  • Good English oral and written skill

Nice to have

  • Strong self-motivation and problem-solving skills
  • Passion for technological innovation and practical engineering, with a drive for continuous exploration and improvement
  • Experience in model fine-tuning, inference optimization or related tool development is a plus

What the JD emphasized

  • quantization and compression techniques
  • LLMs
  • text-to-image/video generation models
  • efficient model deployment
  • inference/finetuning acceleration

Other signals

  • optimize for Intel AI platform
  • quantization and compression techniques for LLMs
  • efficient model deployment and inference/finetuning acceleration