Semiconductors · Wafer-scale AI chip
| Title | Stage | AI score |
|---|---|---|
| Applied Machine Learning Research Scientist This role focuses on applying and scaling modern machine learning techniques, particularly LLM post-training (RLHF, GRPO), on Cerebras' wafer-scale AI chip. The scientist will build and maintain training pipelines, evaluation frameworks, and optimize ML workflows across pretraining, fine-tuning, and alignment stages, working with large datasets and contributing to shared ML infrastructure. | Post-trainData | 9 |
| Kernel Engineer The Kernel Engineer will develop high-performance software solutions for AI and HPC workloads, focusing on implementing, optimizing, and scaling deep learning operations on Cerebras' custom hardware. This involves designing, developing, and debugging low-level kernels and algorithms to maximize compute utilization and training efficiency, while also studying emerging ML trends and interacting with hardware architects. |
| ServePost-train |
| 9 |
| Staff Inference ML Runtime Engineer Staff Inference ML Runtime Engineer at Cerebras Systems, focusing on optimizing and scaling their wafer-scale AI chip for high-throughput, low-latency generative AI inference. The role involves designing and implementing ML features, APIs, and distributed runtime solutions, working with state-of-the-art generative AI models and multimodal data. | Serve | 9 |
| Senior Runtime Engineer Senior Runtime Engineer role at Cerebras, focusing on designing and developing high-performance distributed software for large-scale AI training and inference workloads on their wafer-scale architecture. The role involves optimizing compute and data pipelines, ensuring scalability, and collaborating with ML and compiler teams. Requires strong C++ and distributed systems experience, with familiarity in ML pipelines preferred. | ServeAgent | 9 |
| Senior Performance Engineer, Inference Senior Performance Engineer focused on benchmarking Cerebras' AI inference performance against competitors and analyzing pricing models. Requires deep expertise in open-source inference stacks, GPU optimization, and LLM inference economics. | Serve | 8 |
| Engineering Manager, Inference ML Runtime Engineering Manager for Inference ML Runtime at Cerebras, leading a team to design and scale systems for executing state-of-the-art AI models on Cerebras hardware. The role focuses on ML, distributed systems, and high-performance runtime engineering, with a goal of delivering the fastest Generative AI inference solution. | Serve | 8 |
| New Grad - ML Stack Optimization Engineer New Grad ML Stack Optimization Engineer role at Cerebras, focusing on optimizing compiler technologies for AI chips using LLVM and MLIR frameworks to enhance performance and efficiency of AI applications on their wafer-scale architecture. | Serve | 8 |
| ML Systems Performance Engineer ML Systems Performance Engineer at Cerebras, focusing on optimizing end-to-end model inference speed and throughput on their wafer-scale AI chip. Responsibilities include kernel optimization, system performance analysis, and developing performance modeling and diagnostic tools. | Serve | 8 |
| Performance & Reliability Engineer The Performance & Reliability Engineer will characterize and optimize the performance and reliability of advanced ML hardware/software systems, focusing on reducing power and thermal fluctuations. This role involves analyzing ML workloads, software kernels, and hardware architecture, developing software solutions for reliability and performance, and influencing next-generation AI architecture design. | Serve | 8 |
| Member of Technical Staff (Software Engineer) Software Engineer to implement and optimize high-performance, low-latency inference services on Cerebras' wafer-scale AI chip, focusing on Kubernetes deployment, resource management, and reliability. This role involves collaborating with ML engineers, debugging complex issues, and ensuring the scalability and fault tolerance of AI inference workloads. | Serve | 7 |
| Sr. Member of Technical Staff This role focuses on developing and maintaining cloud-based deployment workflows for AI inference software, utilizing containerization and orchestration technologies like Docker and Kubernetes. The responsibilities include ensuring system resiliency, high availability, and optimizing performance for low-latency inference tasks. The role also involves debugging, monitoring, and documenting inference services, with a strong emphasis on infrastructure-as-code and CI/CD practices. | Serve | 7 |
| Advanced Technology: Compiler Engineer Cerebras is seeking a Compiler Engineer to work on their Tungsten language compiler, which is purpose-built for their wafer-scale AI hardware. The role involves designing and implementing compiler passes, co-designing language constructs, and developing code generation strategies for AI and scientific workloads. The engineer will collaborate with ASIC, kernel, and AI teams, and contribute to the broader toolchain including runtime and debuggers. Experience with novel architectures and ML compiler frameworks is valuable. | Serve | 7 |
| Senior ML Software Engineer - Integration & Quality Senior ML Software Engineer focused on integrating and validating the software stack for the Cerebras AI platform, ensuring reliable and efficient execution of large-scale ML workloads. This role involves debugging complex distributed systems, improving automation, and enhancing the reliability of AI infrastructure, working closely with runtime, compiler, kernel, and hardware teams. | Serve | 7 |
| Site Reliability Engineer - Ops & Automation Cerebras is seeking a Site Reliability Engineer to support their high-performance AI inference services powered by the Wafer-Scale Engine. The role involves operational execution, developing self-service CD pipelines, building automation tools, and enhancing observability for large-scale AI infrastructure. The position requires production Kubernetes experience and proficiency in Python or Go. | Serve | 7 |
| Staff Site Reliability Engineer – Automation and Platform Staff Site Reliability Engineer focused on building and scaling high-performance SRE functions for Cerebras' AI inference services, powered by their Wafer-Scale Engine. The role involves leading engineering efforts to implement self-service delivery pipelines, shared observability tooling, and GitOps-driven CD for model releases and cluster management. The goal is to enable core teams, product managers, and external customers to operate in a fully self-service model with strong reliability guarantees, while also mentoring early-career SREs. The role emphasizes turning complexity into reliability at scale for frontier AI inference. | Serve | 7 |
| Principal Engineer, Inference Cloud Principal Engineer for Cerebras' Inference Cloud Platform, focusing on availability, latency, reliability, and multi-region scale for their AI chip-based inference solution. This senior IC role involves defining long-term architecture, driving execution on critical paths, and contributing production code for large-scale distributed systems. | Serve | 7 |
| Staff Software Engineer, Inference Cloud Staff Software Engineer role focused on building and operating the Inference Cloud Platform, responsible for availability, latency, reliability, and global scale of AI inference workloads. Requires deep expertise in distributed systems, high-QPS optimization, and experience with ML inference infrastructure. | Serve | 7 |
| AI Infrastructure Operations Engineer The AI Infrastructure Operations Engineer will manage and operate Cerebras' advanced AI compute clusters, ensuring their health, performance, and availability. This role focuses on maximizing compute capacity, deploying container-based services, and providing 24/7 monitoring and support for large-scale machine learning infrastructure. | Serve | 7 |