ML Systems Performance Engineer

Cerebras Cerebras · Semiconductors · India · Software

ML Systems Performance Engineer role focused on optimizing inference speed and throughput on Cerebras' custom wafer-scale AI chip. Responsibilities include building performance models, optimizing kernel microcode and compiler algorithms, debugging runtime performance, and developing performance visualization tools. Requires strong background in computer architecture, low-level deep learning math, and experience with performance profiling and optimization on CPU/GPU simulators.

What you'd actually do

  1. Build performance models (kernel-level, end-to-end) to estimate the performance of state of the art and customer ML models.
  2. Optimize and debug our kernel micro code and compiler algorithms to elevate ML model inference speed, throughput and compute utilization on the Cerebras WSE.
  3. Debug and understand runtime performance on the system and cluster.
  4. Develop tools and infrastructure to help visualize performance data collected from the Wafer Scale Engine and our compute cluster.

Skills

Required

  • Bachelors / Masters / PhD in Electrical Engineering or Computer Science
  • Strong background in computer architecture
  • Exposure to and understanding of low-level deep learning / LLM math
  • Strong analytical and problem-solving mindset
  • 3+ years of experience in a relevant domain (Computer Architecture, CPU/GPU Performance, Kernel Optimization, HPC)
  • Experience working on CPU/GPU simulators
  • Exposure to performance profiling and debug on any system pipeline
  • Comfort with C++ and Python

What the JD emphasized

  • low-level kernel performance debugging and optimization
  • system-level performance analysis
  • performance modeling and estimation
  • development of tooling for performance projection and diagnostics
  • low-level deep learning / LLM math
  • performance profiling and debug on any system pipeline

Other signals

  • Optimizing inference performance on custom AI hardware
  • Developing performance models for ML workloads
  • Low-level kernel optimization and system-level analysis