Principal Engineer, Inference Cloud

Cerebras Cerebras · Semiconductors · Headquarters +1 · Software

Principal Engineer for Cerebras' Inference Cloud Platform, focusing on availability, latency, reliability, and multi-region scale for their AI chip. This senior IC role involves defining long-term architecture, driving execution on critical paths, and contributing production code for large-scale distributed systems supporting AI workloads.

What you'd actually do

  1. Identify the most important technical problems for the platform, often before there's a clear ask. Make explicit tradeoff decisions about what the platform will and won't support, with reasoning that holds up under scrutiny from senior engineering leadership.
  2. Set the long-term technical direction for the Inference Cloud Platform, including multi-region topology, failure domains, service boundaries, and system evolution over time.
  3. Architect active-active systems with rapid failover and graceful degradation (circuit breaking, backpressure, load shedding) with clear SLOs. Drive improvements in latency, throughput, capacity efficiency, and resilience under unpredictable demand.
  4. Contribute production code in critical paths, review designs and implementations, and make architectural decisions including build-vs-buy tradeoffs with long-term operational consequences.
  5. Lead on the hardest production issues and cross-system bottlenecks. Drive observability, incident response, capacity planning, and post-incident improvement with a high standard for operational rigor.

Skills

Required

  • 10+ years of experience in software engineering
  • substantial individual contributor experience building and operating large-scale distributed systems or cloud infrastructure
  • Deep expertise in distributed systems architecture in cloud environments, including networking, compute orchestration, container platforms, and multi-region production services
  • Strong track record of making sound architectural decisions for highly available, latency-sensitive systems at scale, demonstrated through systems you built directly
  • Experience optimizing latency, throughput, and efficiency in high-QPS systems
  • Strong proficiency in backend or systems languages such as Go, C++, or Python
  • Experience designing observability and reliability practices, including metrics, logging, tracing, alerting, incident response, and SLI/SLO/SLA-driven operations
  • Ability to influence senior engineers, technical leads, and cross-functional partners through technical credibility, communication, and judgment

Nice to have

  • Experience with TTFT and tail-latency reduction
  • Experience with ML inference infrastructure, model serving systems, or GPU-accelerated workloads

What the JD emphasized

  • production code
  • large-scale distributed systems
  • multi-region production services
  • highly available, latency-sensitive systems at scale
  • high-QPS systems
  • ML inference infrastructure
  • model serving systems
  • GPU-accelerated workloads

Other signals

  • AI chip
  • AI applications
  • agentic computation
  • model labs
  • AI-native startups
  • Inference Cloud Platform
  • multi-region scale
  • ML inference infrastructure
  • model serving systems
  • GPU-accelerated workloads