AI Software Development Engineer

Intel Intel · Semiconductors · California, Folsom, United States +2

AI Software Development Engineer focused on optimizing AI inference workloads (LLMs, Diffusion models) on Intel GPUs. This role involves end-to-end optimization across graph compilation, runtime execution, and low-level GPU kernels, requiring strong C++ skills and understanding of GPU architectures and neural network inference.

What you'd actually do

  1. Optimize emerging AI inference workloads such as Large Language Models (LLMs) and Diffusion models on GPUs
  2. Develop and optimize graph-based compilation flows (e.g., MLIR/LLVM) for neural network workloads
  3. Write and tune performance-critical GPU kernels and runtime code in C++ or parallel programming languages
  4. Identify and resolve bottlenecks across compiler, runtime, and kernel layers
  5. Profile, benchmark, and characterize AI workloads to validate performance gains

Skills

Required

  • Bachelor's degree with 4+ years of relevant experience, OR Master's degree with 2+ years of relevant experience in Computer Science or a related field
  • Strong C++ development and debugging skills
  • Solid understanding of GPU architectures or AI accelerators
  • Hands-on experience with modern neural network architecture for inference on hardware accelerators

Nice to have

  • PhD and 1+ years of relevant experience
  • Experience optimizing end-to-end real-world AI workloads
  • Familiarity with OpenVINO or other AI inference frameworks
  • Knowledge of neural network optimization techniques and performance tradeoffs
  • Experience across multiple layers of the AI software stack, including: AI inference engines or runtimes, Graph compilers (e.g., MLIR/LLVM), GPU kernels or performance critical compute code
  • Performance profiling and workload analysis

What the JD emphasized

  • performance-critical GPU kernels
  • performance improvements
  • performance tradeoffs
  • Performance profiling and workload analysis

Other signals

  • end-to-end optimization of AI inference workloads
  • performance improvements for modern AI models
  • LLMs and Diffusion models on GPUs
  • graph compilation, runtime execution, and low-level GPU kernels