Full Stack LLM Engineer

Cerebras Cerebras · Semiconductors · Toronto, ON · Software

The role focuses on bringing up and optimizing state-of-the-art ML models on Cerebras CSX systems, involving model architecture translation, compiler optimizations, runtime integration, and performance tuning. It requires strong debugging skills across the full AI toolchain, from Python modeling code to low-level C/C++ optimization and compiler development (LLVM/MLIR).

What you'd actually do

  1. Contribute to the end-to-end bring up of ML models on Cerebras CSX systems.
  2. Work across the stack: model architecture translation, graph lowering, compiler optimizations, runtime integration, and performance tuning.
  3. Debug performance and correctness issues spanning model code, compiler IRs, runtime behavior, and hardware utilization.
  4. Propose and prototype improvements across tools, APIs, or automation flows to accelerate future bring ups.

Skills

Required

  • Bachelor’s, Master’s, or PhD in Computer Science, Engineering, or a related field.
  • Comfort navigating the full AI toolchain: Python modeling code, compiler IRs, performance profiling, etc.
  • Strong debugging skills across performance, numerical accuracy, and runtime integration.
  • Experience with deep learning frameworks (e.g., PyTorch, TensorFlow) and familiarity with model internals (e.g., attention, MoE, diffusion).
  • Proficiency in C/C++ programming and experience with low-level optimization.
  • Proven experience in compiler development, particularly with LLVM and/or MLIR.
  • Strong background in optimization techniques, particularly those involving NP-hard problems.

What the JD emphasized

  • full AI toolchain
  • low-level optimization
  • compiler development
  • performance tuning

Other signals

  • bring up state-of-the-art open-source models
  • customer-provided proprietary models
  • achieving unprecedented levels of performance, efficiency, and scalability for AI applications