ML Systems Performance Engineer

Cerebras Cerebras · Semiconductors · Headquarters +2 · Software

ML Systems Performance Engineer at Cerebras, focusing on optimizing end-to-end model inference speed and throughput on their wafer-scale AI chip. Responsibilities include kernel optimization, system performance analysis, and developing performance modeling and diagnostic tools.

What you'd actually do

  1. Build performance models (kernel-level, end-to-end) to estimate the performance of state of the art and customer ML models.
  2. Optimize and debug our kernel micro code and compiler algorithms to elevate ML model inference speed, throughput and compute utilization on the Cerebras WSE.
  3. Debug and understand runtime performance on the system and cluster.
  4. Develop tools and infrastructure to help visualize performance data collected from the Wafer Scale Engine and our compute cluster.

Skills

Required

  • Bachelors / Masters / PhD in Electrical Engineering or Computer Science
  • Strong background in computer architecture
  • Exposure to and understanding of low-level deep learning / LLM math
  • Strong analytical and problem-solving mindset
  • 3+ years of experience in a relevant domain (Computer Architecture, CPU/GPU Performance, Kernel Optimization, HPC)
  • Experience working on CPU/GPU simulators
  • Exposure to performance profiling and debug on any system pipeline
  • Comfort with C++ and Python

What the JD emphasized

  • 3+ years of experience in a relevant domain (Computer Architecture, CPU/GPU Performance, Kernel Optimization, HPC)
  • Strong background in computer architecture
  • Exposure to and understanding of low-level deep learning / LLM math

Other signals

  • Optimizing inference speed and throughput
  • Low-level kernel performance debugging and optimization
  • System-level performance analysis
  • Performance modeling and estimation
  • Development of tooling for performance projection and diagnostics