Staff Software Engineer, Inference Platform

Cerebras Cerebras · Semiconductors · Headquarters +1 · Software

Staff Software Engineer focused on building and operating a large-scale, distributed inference platform on Cerebras' AI hardware. The role involves designing, developing, and maintaining the orchestration layer connecting cloud components with ML services, with a strong emphasis on reliability, performance, and security.

What you'd actually do

  1. Design, develop, test, and maintain production software, with responsibilities spanning testing, continuous development, observability, security, networking, debugging, and productionization.
  2. Raise the effectiveness of senior engineers through design feedback, pairing, and clear technical standards.
  3. Platform Direction. Help shape the technical direction for the Inference Platform, Kubernetes custom resource definitions, failure domains, service boundaries, and system evolution over time, and own the roadmap for major technical areas.
  4. Reliability & Performance. Architect active-active systems with rapid failover, graceful degradation, and clear SLOs. Drive system-level improvements in latency, throughput, capacity efficiency, and resilience under unpredictable demand.
  5. Execution on Critical Paths. Write and review production code in the most important parts of the platform. Make high-consequence architectural decisions within your area and set the technical bar through design reviews, code reviews, and sound engineering judgment.

Skills

Required

  • 8+ years of experience in software engineering
  • building and operating large-scale distributed systems or cloud infrastructure
  • Kubernetes
  • highly available, latency-sensitive systems at scale
  • security (certificates, TLS, mTLS)
  • optimizing latency, throughput, and efficiency in high-QPS systems
  • Go or C++
  • designing observability and reliability practices
  • influence senior engineers and cross-functional partners

Nice to have

  • Experience with ML inference infrastructure, model serving systems, or GPU-accelerated workloads
  • TTFT and tail-latency reduction

What the JD emphasized

  • production software
  • production code
  • production leadership
  • large-scale distributed systems
  • highly available, latency-sensitive systems at scale
  • high-QPS systems
  • ML inference infrastructure

Other signals

  • inference platform
  • distributed systems
  • Kubernetes