Senior ML Software Engineer - Integration & Quality

Cerebras Cerebras · Semiconductors · US and Canada Offices · Software

Software Engineer role focused on integrating and validating the ML software stack for Cerebras' AI platform, which includes large AI chips for training and inference. The role involves debugging complex issues, improving automation, and ensuring reliability of the AI infrastructure, particularly for inference workloads.

What you'd actually do

  1. Integrate and validate software components across the Cerebras AI platform.
  2. Collaborate with engineers across ML runtime, compiler, kernel, and hardware teams to ensure reliable feature integration.
  3. Investigate and debug complex issues across distributed systems and large-scale ML workloads.
  4. Build automation tools and infrastructure to support integration testing, system validation, and debugging workflows.
  5. Develop and maintain testbeds used to validate system performance and reliability.

Skills

Required

  • Python
  • C++
  • Go
  • debugging complex systems
  • distributed software environments
  • systems-level development
  • infrastructure tooling
  • platform integration
  • automation tools
  • testing frameworks
  • internal developer tooling
  • problem-solving skills
  • communication skills
  • collaboration skills

Nice to have

  • machine learning infrastructure
  • ML model deployment
  • LLM workloads
  • multimodal model workloads
  • distributed systems
  • cloud infrastructure
  • large-scale compute clusters
  • performance debugging
  • profiling
  • system observability tools
  • microservices
  • containerized environments
  • cluster orchestration
  • hardware accelerators
  • compilers
  • ML frameworks

What the JD emphasized

  • large-scale ML workloads
  • inference platform
  • ML infrastructure
  • distributed systems
  • hardware/software co-design
  • debug complex issues
  • improving the reliability

Other signals

  • ML infrastructure
  • distributed systems
  • hardware/software co-design
  • large-scale ML workloads
  • inference platform