Kernel Engineer

Cerebras Cerebras · Semiconductors · India · Software

Develops high-performance software for AI and HPC workloads, focusing on implementing, optimizing, and scaling deep learning operations on Cerebras' custom processor architecture. This involves designing, tuning, and validating ML and HPC kernels, and building parallel/distributed algorithms to maximize compute utilization and training efficiency.

What you'd actually do

  1. Develop design specifications for new machine learning and linear algebra kernels and mapping to the Cerebras WSE System using various parallel programming algorithms.
  2. Develop and debug kernel library of highly optimized low level assembly instruction and C-like domain specific language routines to implement algorithms targeting the Cerebras hardware system.
  3. Develop and debug high-performance kernel routines in low-level assembly and a custom C-like (CSL) language, implementing algorithms optimized for the Cerebras hardware system.
  4. Using mathematical models and analysis to measure the software performance and inform design decisions.
  5. Develop and integrate unit and system testing methodologies to verify correct functionality and performance of kernel libraries.

Skills

Required

  • C++
  • Python
  • low-level systems programming
  • library/API development best practices
  • performance optimization
  • debugging skills across complex, layered software stacks

Nice to have

  • kernel development
  • performance optimization
  • low-level systems programming
  • parallel algorithms
  • distributed memory systems
  • accelerators such as GPUs, FPGAs, or other custom hardware
  • machine learning workloads
  • TensorFlow
  • PyTorch
  • HPC kernels
  • optimizing them on modern architectures

What the JD emphasized

  • high-performance software
  • fully leverage our custom, massively parallel processor architecture
  • design, performance tuning, and validation of foundational ML and HPC kernels
  • building a library of parallel and distributed algorithms that maximize compute utilization and push the boundaries of training efficiency for state-of-the-art AI models
  • critical to unlocking the full potential of our hardware and accelerating the pace of AI innovation
  • Develop design specifications for new machine learning and linear algebra kernels
  • Develop and debug kernel library of highly optimized low level assembly instruction and C-like domain specific language routines
  • Develop and debug high-performance kernel routines in low-level assembly and a custom C-like (CSL) language
  • implementing algorithms optimized for the Cerebras hardware system
  • measure the software performance and inform design decisions
  • verify correct functionality and performance of kernel libraries
  • Study emerging trends in Machine Learning applications and help evolve Kernel library architecture to address computational challenges of the start-of-the-art Neural Networks
  • Interact with chip and system architects to optimize instruction sets, microarchitecture, and IO of next generation systems

Other signals

  • optimizing and scaling deep learning operations
  • foundational ML and HPC kernels
  • push the boundaries of training efficiency
  • implementing, optimizing, and scaling deep learning operations