Performance Engineer

Cerebras · Semiconductors · Toronto, ON · Performance

The role focuses on optimizing the performance of Cerebras' Runtime software driver, which runs on x86 machines and supports their AI accelerator chip. Responsibilities include CPU and memory subsystem optimizations, developing efficient data movement algorithms, utilizing advanced CPU features, performance profiling, and influencing future hardware/software designs. The role requires strong C/C++ skills and experience in performance engineering and system-level tuning.

What you'd actually do

  1. Focus on CPU and memory subsystem optimizations for our Runtime software driver, enabling faster key cloud and ML training/inference workloads across modern x86 machines that form the backbone of our AI accelerator.
  2. Develop and enhance algorithms for efficient data movement, local data processing, job submission, and synchronization between various software and hardware components.
  3. Optimize our workloads using advanced CPU features like AVX instructions, prefetch mechanisms, and cache optimization techniques.
  4. Perform performance profiling and characterization using tools such as AMD uprof, and reduce OS level overheads.
  5. Influence the design of Cerebras' next-generation AI architectures and software stack by analyzing the integration of advanced CPU features and their impact on system performance and computational efficiency.

Skills

Required

  • C/C++
  • Python
  • memory subsystem optimizations
  • system-level performance tuning
  • compiler technologies (e.g., LLVM, MLIR)
  • PyTorch
  • ML frameworks

Nice to have

  • distributed systems

What the JD emphasized

  • optimizing AI applications
  • high-performance ML training and inference solutions
  • CPU and memory subsystem optimizations
  • performance profiling and characterization
  • distributed systems is highly desirable

Other signals

  • Optimizing AI applications
  • High-performance ML training and inference solutions
  • CPU and memory subsystem optimizations for Runtime software driver
  • Performance profiling and characterization