Lead Full Stack Machine Learning Engineer

Cerebras Cerebras · Semiconductors · India · Software

Lead Full Stack Machine Learning Engineer at Cerebras Systems, focusing on bringing up state-of-the-art open-source models and frameworks on Cerebras CSX systems. Responsibilities include end-to-end framework bring-up for RL and inference serving, working across the full software stack from model translation to hardware utilization, and debugging performance issues. Requires strong C/C++ and optimization skills, experience with deep learning frameworks, and a Bachelor's/Master's/PhD with 10+ years of experience.

What you'd actually do

  1. Contribute to the end-to-end bring up of frameworks for RL, inference serving, ML models on Cerebras CSX systems.
  2. Work across the stack: model architecture translation, graph lowering, compiler optimizations, runtime integration, and performance tuning.
  3. Debug performance and correctness issues spanning model code, compiler IRs, runtime behavior, and hardware utilization.
  4. Propose and prototype improvements across tools, APIs, or automation flows to accelerate future bring ups.

Skills

Required

  • Bachelor’s, Master’s, or PhD in Computer Science, Engineering, or a related field
  • 10+ years’ experience
  • Comfort navigating the full AI toolchain: Python modelling code, compiler IRs, performance profiling, etc.
  • Strong debugging skills across performance, numerical accuracy, and runtime integration.
  • Experience with deep learning frameworks (e.g., PyTorch, TensorFlow)
  • familiarity with model internals (e.g., attention, MoE, diffusion).
  • Proficiency in C/C++ programming
  • experience with low-level optimization.
  • Strong background in optimization techniques, particularly those involving NP-hard problems.

Nice to have

  • RLHF
  • inference serving
  • ML models
  • graph lowering
  • compiler optimizations
  • runtime integration
  • hardware utilization

What the JD emphasized

  • 10+ years’ experience
  • full AI toolchain
  • performance tuning
  • low-level optimization
  • optimization techniques

Other signals

  • Bring up state-of-the-art open-source models, frameworks and data engineering
  • End-to-end bring up of frameworks for RL, inference serving, ML models on Cerebras CSX systems
  • Work across the stack: model architecture translation, graph lowering, compiler optimizations, runtime integration, and performance tuning
  • Debug performance and correctness issues spanning model code, compiler IRs, runtime behavior, and hardware utilization
  • Propose and prototype improvements across tools, APIs, or automation flows to accelerate future bring ups