What you'd actually do

Implement infrastructure to support high-performance, low-latency inference service.

Deploy and configure Kubernetes services to ensure scalability and reliability of inference workloads.

Optimize resource allocation and auto-scaling policies to handle variable inference demand while minimizing operational costs.

Integrate inference services with containerized environments using Docker and Kubernetes for orchestration.

Ensure high availability and fault tolerance by implementing multi-region deployments and disaster recovery strategies.

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. This architecture allows Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU-based hyperscale cloud inference services.

This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

Cerebras works with the leading model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference.

About The Role We are seeking a Software Engineer to develop and maintain high-performance, low-latency inference infrastructure. This role focuses on deploying and optimizing scalable inference services, collaborating with cross-functional teams, and ensuring reliable, production-ready machine learning infrastructure. Responsibilities

Implement infrastructure to support high-performance, low-latency inference service.
Deploy and configure Kubernetes services to ensure scalability and reliability of inference workloads.
Optimize resource allocation and auto-scaling policies to handle variable inference demand while minimizing operational costs.
Integrate inference services with containerized environments using Docker and Kubernetes for orchestration.
Ensure high availability and fault tolerance by implementing multi-region deployments and disaster recovery strategies.
Develop Python-based scripts and APIs to streamline data preprocessing, inference execution, and post-processing for real-time inference tasks.
Collaborate with machine learning engineers to validate inference accuracy and performance against functional and latency requirements.
Triage and resolve defects in the service by analyzing logs, metrics, and distributed traces.
Debug issues related to model deployment, container orchestration, or networking configurations, documenting steps to reproduce and root-cause defects.
Collaborate with cross-functional teams to address performance regressions, scalability issues, or integration failures in the inference pipeline.
Develop automated scripts to detect and mitigate common failure modes, improving system reliability.
Author detailed technical documentation for infrastructure configurations, inference workflows, and APIs, ensuring clarity for internal teams and external customers.
Work with product management and user experience teams to define requirements for inference service interfaces, including configuration, monitoring, and event logging.
Document and track defects, enhancements, and release notes using tools like Jira and Git, ensuring version control and traceability.
Participate in release planning and prioritization discussions to align infrastructure development with customer needs and business objectives.

Skills & Qualifications

Minimum Requirements

Master's degree (or foreign equivalent) in Computer Science or a related field.
One (1) year of experience as a Software Developer, Student/Intern (Software Developer), Member of Technical Staff (Software Engineer), Software Engineer, or a related occupation.
Employer accepts full-time or equivalent part-time experience gained before, during, or after graduate studies.

Required Skills:

Docker and Kubernetes;
Java or C++;
ActiveMQ and Kafka;
Python or Groovy;
JavaScript or TypeScript;
Linux;
SQL, OracleDB, and Redis; and
Git

Why Join Cerebras

People who are serious about software make their own hardware. At Cerebras, we have built a breakthrough architecture that is unlocking new opportunities for the AI industry. With dozens of model releases and rapid growth, we’ve reached an inflection point in our business. Members of our team tell us there are five main reasons they joined Cerebras:

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Find out more about what it's like to work at Cerebras **here**!

Apply today and become part of the forefront of groundbreaking advancements in AI!

Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.

This website or its third-party tools process personal data. For more details, click _here_ to review our CCPA disclosure notice.

This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation.

Implement infrastructure to support high-performance, low-latency inference service.
Deploy and configure Kubernetes services to ensure scalability and reliability of inference workloads.
Optimize resource allocation and auto-scaling policies to handle variable inference demand while minimizing operational costs.
Integrate inference services with containerized environments using Docker and Kubernetes for orchestration.
Ensure high availability and fault tolerance by implementing multi-region deployments and disaster recovery strategies.
Develop Python-based scripts and APIs to streamline data preprocessing, inference execution, and post-processing for real-time inference tasks.
Collaborate with machine learning engineers to validate inference accuracy and performance against functional and latency requirements.
Triage and resolve defects in the service by analyzing logs, metrics, and distributed traces.
Debug issues related to model deployment, container orchestration, or networking configurations, documenting steps to reproduce and root-cause defects.
Collaborate with cross-functional teams to address performance regressions, scalability issues, or integration failures in the inference pipeline.
Develop automated scripts to detect and mitigate common failure modes, improving system reliability.
Author detailed technical documentation for infrastructure configurations, inference workflows, and APIs, ensuring clarity for internal teams and external customers.
Work with product management and user experience teams to define requirements for inference service interfaces, including configuration, monitoring, and event logging.
Document and track defects, enhancements, and release notes using tools like Jira and Git, ensuring version control and traceability.
Participate in release planning and prioritization discussions to align infrastructure development with customer needs and business objectives.

Skills & Qualifications

Minimum Requirements

Master's degree (or foreign equivalent) in Computer Science or a related field.
One (1) year of experience as a Software Developer, Student/Intern (Software Developer), Member of Technical Staff (Software Engineer), Software Engineer, or a related occupation.
Employer accepts full-time or equivalent part-time experience gained before, during, or after graduate studies.

Required Skills:

Docker and Kubernetes;
Java or C++;
ActiveMQ and Kafka;
Python or Groovy;
JavaScript or TypeScript;
Linux;
SQL, OracleDB, and Redis; and
Git

Why Join Cerebras

Build a breakthrough AI platform beyond the constraints of the GPU.
Publish and open source their cutting-edge AI research.
Work on one of the fastest AI supercomputers in the world.
Enjoy job stability with startup vitality.
Our simple, non-corporate work culture that respects individual beliefs.

Find out more about what it's like to work at Cerebras **here**!

Apply today and become part of the forefront of groundbreaking advancements in AI!

This website or its third-party tools process personal data. For more details, click _here_ to review our CCPA disclosure notice.

Member of Technical Staff (software Engineer)

What you'd actually do

Skills

Required

What the JD emphasized

Other signals

Minimum Requirements

Minimum Requirements