ML Systems Performance Engineer

Cerebras Cerebras · Semiconductors · US and Canada Offices · Software

ML Systems Performance Engineer role focused on optimizing and debugging inference speed, throughput, and compute utilization on Cerebras' AI hardware. Responsibilities include building performance models, optimizing kernel microcode and compiler algorithms, debugging runtime performance, and developing performance visualization tools.

What you'd actually do

  1. Build performance models (kernel-level, end-to-end) to estimate the performance of state of the art and customer ML models.
  2. Optimize and debug our kernel micro code and compiler algorithms to elevate ML model inference speed, throughput and compute utilization on the Cerebras WSE.
  3. Debug and understand runtime performance on the system and cluster.
  4. Develop tools and infrastructure to help visualize performance data collected from the Wafer Scale Engine and our compute cluster.

Skills

Required

  • Bachelors / Masters / PhD in Electrical Engineering or Computer Science
  • Strong background in computer architecture
  • Exposure to and understanding of low-level deep learning / LLM math
  • Strong analytical and problem-solving mindset
  • 3+ years of experience in a relevant domain (Computer Architecture, CPU/GPU Performance, Kernel Optimization, HPC)
  • Experience working on CPU/GPU simulators
  • Exposure to performance profiling and debug on any system pipeline
  • Comfort with C++ and Python

What the JD emphasized

  • end-to-end model inference speed and throughput
  • low-level kernel performance debugging and optimization
  • performance modeling and estimation
  • development of tooling for performance projection and diagnostics

Other signals

  • inference performance
  • kernel optimization
  • performance modeling
  • hardware-software intersection