Applied AI Frameworks Engineer

Intel Intel · Semiconductors · Bangalore, India

This role focuses on designing and developing features for Intel's AI frameworks software stack, specifically optimizing inference serving frameworks (like SGLang, vLLM) and ML frameworks (PyTorch, Tensorflow, JAX) for Intel's AI accelerators and GPUs. The engineer will enhance deep learning training and inference capabilities, identify optimization opportunities, and contribute to open-source communities.

What you'd actually do

  1. Design and develop SW features for AI frameworks - both HW-agnostic and HW-aware, like ML kernel development
  2. Enhance and extend the Deep learning training, and Inference capabilities in the Software stack.
  3. Identifying optimization opportunities in the software stack to enhance performance of Deep learning workloads
  4. Participate in discussions with Open-source community, involve in development and open-source software.

Skills

Required

  • Advanced C++ (C++ 14/17)
  • Python
  • parallel programming
  • machine learning kernels such as GEMM, Convolution, Flash attention
  • SGLang
  • vLLM
  • Deep Learning models/LLMs for text, vision, NLP
  • computer architecture
  • HW-SW optimization techniques

Nice to have

  • Triton based kernels
  • compiler algorithms for heterogeneous system
  • Fuser optimizations

What the JD emphasized

  • frameworks such as SGLang, vLLM
  • Deep learning training, and Inference capabilities
  • state of the art AI workloads (including LLMs)
  • Intel's AI accelerators and next generation GPUs
  • ML kernel development
  • frameworks/platforms that have gone to production

Other signals

  • design and developing features for Intel' AI frameworks software stack
  • develop and optimizing software stack and state of the art AI workloads (including LLMs) for Intel's AI accelerators and next generation GPUs
  • Enhance and extend the Deep learning training, and Inference capabilities in the Software stack
  • Identifying optimization opportunities in the software stack to enhance performance of Deep learning workloads