Engineering Manager, Inference ML Runtime

Cerebras Cerebras · Semiconductors · Headquarters +2 · Software

Engineering Manager for Inference ML Runtime at Cerebras, leading a team to design and scale systems for executing state-of-the-art AI models on Cerebras hardware. The role focuses on ML, distributed systems, and high-performance runtime engineering, with a goal of delivering the fastest Generative AI inference solution.

What you'd actually do

  1. Own the architecture and evolution of the ML inference runtime and serving systems.
  2. Build, manage, and grow a team of ML systems and infrastructure engineers.
  3. Drive execution of complex, cross-functional initiatives across:
  4. Scale Cerebras’ inference platform to handle large volumes of concurrent requests at very fast speed
  5. Partner with cloud, compiler, core runtime, hardware, and ML teams to optimize end-to-end performance.

Skills

Required

  • 8+ years of experience in ML, distributed systems, and high-performance runtime engineering
  • 2+ years of engineering management experience
  • Strong programming skills in C++, Python
  • Experience building and scaling large-scale inference systems (LLMs or multimodal)
  • Experience working with cloud infrastructures and following best-practices for building scalable microservices and applications

Nice to have

  • Experience with GPU programming, CUDA, or other parallel computing frameworks
  • Familiarity with ML compilers, model optimization techniques, or distributed training frameworks

What the JD emphasized

  • fastest Generative AI inference solution in the world
  • ML inference runtime and serving systems
  • large-scale inference systems (LLMs or multimodal)

Other signals

  • leading a team responsible for designing and scaling systems for AI model execution
  • ML inference runtime and serving systems
  • large-scale inference systems (LLMs or multimodal)
  • fastest Generative AI inference solution