Full Stack LLM Engineer

Cerebras · Semiconductors · Toronto, ON · Software

Cerebras is seeking a Full Stack LLM Engineer to join their Inference Core Model Bringup team. This role involves bringing up state-of-the-art open-source and proprietary models on Cerebras CSX systems, working across the entire software stack from model translation and compiler optimizations to runtime integration and performance tuning. The engineer will debug performance and correctness issues and propose improvements to tools and automation. Experience with deep learning frameworks, model internals, C/C++, and compiler development (LLVM/MLIR) is required.

What you'd actually do

  1. Contribute to the end-to-end bring up of ML models on Cerebras CSX systems.
  2. Work across the stack: model architecture translation, graph lowering, compiler optimizations, runtime integration, and performance tuning.
  3. Debug performance and correctness issues spanning model code, compiler IRs, runtime behavior, and hardware utilization.
  4. Propose and prototype improvements across tools, APIs, or automation flows to accelerate future bring ups.

Skills

Required

  • Python modeling code
  • compiler IRs
  • performance profiling
  • deep learning frameworks (PyTorch, TensorFlow)
  • model internals (attention, MoE, diffusion)
  • C/C++ programming
  • low-level optimization
  • compiler development (LLVM, MLIR)
  • optimization techniques

Nice to have

  • Bachelor’s, Master’s, or PhD in Computer Science, Engineering, or a related field.

What the JD emphasized

  • full AI toolchain
  • compiler development
  • low-level optimization
  • performance tuning

Other signals

  • Bring up state-of-the-art open-source models on Cerebras CSX systems
  • Contribute to the end-to-end bring up of ML models on Cerebras CSX systems
  • Debug performance and correctness issues spanning model code, compiler IRs, runtime behavior, and hardware utilization