AI Performance Library Architect

Intel Intel · Semiconductors · Oregon, Hillsboro, United States +1

Software development engineer to work on oneDNN project, a complex cross-platform open-source software project focusing on neural network performance. oneDNN is a critical component of Intel AI strategy, powering key AI applications. Role involves design, development, and maintenance of new functionality in oneDNN to enable performance critical portions of AI workloads, supporting software developers optimizing AI frameworks and workloads for Intel CPUs and GPUs.

What you'd actually do

  1. Design, development, and maintenance of new functionality in oneDNN to enable performance critical portions of AI workloads.
  2. Support software developers optimizing AI frameworks and workloads for Intel CPUs and GPUs.
  3. Support cross-platform ecosystem of AI software developers contributing to oneDNN.
  4. Contribute to the oneDNN project (https://github.com/uxlfoundation/oneDNN).

Skills

Required

  • C and C++
  • Maintaining or contributing to open-source software projects
  • Software libraries design and architecture
  • Implementation of linear algebra algorithms (functions from BLAS, LAPACK, or PyTorch)
  • Performance engineering and software performance optimizations
  • Floating point arithmetic and numerical stability
  • Software development on Linux
  • Low-level performance optimizations using CUDA, x86 assembly or intrinsics, or OpenCL

Nice to have

  • Machine learning and deep learning algorithms or High-performance computing (HPC) applications development
  • Floating point implementations of transcendental functions (sin, cos, tanh, elu, etc)
  • Algorithms for non-IEEE low precision data types (bfloat16, fp8, fp4)
  • AI assisted software development

What the JD emphasized

  • Performance engineering and software performance optimizations
  • Low-level performance optimizations using CUDA, x86 assembly or intrinsics, or OpenCL

Other signals

  • Performance engineering and software performance optimizations
  • Low-level performance optimizations using CUDA, x86 assembly or intrinsics, or OpenCL
  • Machine learning and deep learning algorithms or High-performance computing (HPC) applications development