Member of Technical Staff (software Engineer)

Cerebras Cerebras · Semiconductors · Headquarters +1 · Software

Software Engineer to implement and optimize high-performance, low-latency inference services on Cerebras' wafer-scale AI chip, focusing on Kubernetes deployment, resource management, and reliability. This role involves collaborating with ML engineers, debugging complex issues, and ensuring the scalability and fault tolerance of AI inference workloads.

What you'd actually do

  1. Implement infrastructure to support high-performance, low-latency inference service.
  2. Deploy and configure Kubernetes services to ensure scalability and reliability of inference workloads.
  3. Optimize resource allocation and auto-scaling policies to handle variable inference demand while minimizing operational costs.
  4. Integrate inference services with containerized environments using Docker and Kubernetes for orchestration.
  5. Ensure high availability and fault tolerance by implementing multi-region deployments and disaster recovery strategies.

Skills

Required

  • Docker and Kubernetes
  • Java or C++
  • ActiveMQ and Kafka
  • Python or Groovy
  • JavaScript or TypeScript
  • Linux
  • SQL, OracleDB, and Redis
  • Git

What the JD emphasized

  • high-performance, low-latency inference service
  • scalability and reliability of inference workloads
  • variable inference demand
  • real-time inference tasks
  • inference accuracy and performance
  • latency requirements
  • distributed traces
  • model deployment
  • container orchestration
  • networking configurations
  • performance regressions
  • scalability issues
  • integration failures
  • system reliability
  • inference service interfaces
  • configuration, monitoring, and event logging

Other signals

  • Implement infrastructure to support high-performance, low-latency inference service.
  • Deploy and configure Kubernetes services to ensure scalability and reliability of inference workloads.
  • Optimize resource allocation and auto-scaling policies to handle variable inference demand while minimizing operational costs.
  • Collaborate with machine learning engineers to validate inference accuracy and performance against functional and latency requirements.
  • Triage and resolve defects in the service by analyzing logs, metrics, and distributed traces.