Network Architect

Cerebras Cerebras · Semiconductors · Headquarters +1 · Software

The Network Architect will design and architect front-end datacenter and interconnect fabrics for AI clusters, optimizing for high resource utilization, low latency, and high-throughput communication. This role involves building proof-of-concept implementations, automating deployment and configuration, and establishing SRE-grade telemetry and observability for network reliability. Responsibilities include leading network debugging in distributed systems, collaborating with vendors, and representing the company in industry forums.

What you'd actually do

  1. Design and architect front-end network fabrics for AI/ML and HPC clusters, optimizing for high resource utilization, low latency, and high-throughput communication.
  2. Build proof-of-concept implementations of new network designs and features, and drive them from prototype through production rollout.
  3. Identify and resolve performance and efficiency bottlenecks across the host-NIC-fabric
  4. Automate the deployment, configuration, and validation of network infrastructure using Python, including topology provisioning, fabric bring-up, config generation, and regression Strong programming skills are essential; this role builds tools, not just runbooks.
  5. Stand up and operate SRE-grade telemetry and observability for the cluster network: streaming telemetry (gNMI, OpenConfig, sFlow/IPFIX), metrics pipelines, alerting, and incident workflows. Define the SLIs/SLOs that govern network reliability and drive blameless post-incident analysis.

Skills

Required

  • Python
  • Go
  • VXLAN
  • EVPN
  • RoCEv2
  • BGP
  • DCQCN
  • PFC
  • ECN
  • streaming telemetry
  • Ansible
  • Jinja2
  • gNMI
  • CI/CD pipelines
  • Prometheus
  • InfluxDB
  • Grafana
  • log aggregation
  • incident-management
  • network visibility
  • management
  • packet-capture/analysis tools

Nice to have

  • hyperscalers
  • cloud service providers
  • AI/ML or HPC cluster networking
  • lossless Ethernet design
  • rail-optimized topologies
  • collective-communication traffic patterns
  • open-source networking projects
  • standards bodies
  • industry conferences

What the JD emphasized

  • Automate the deployment, configuration, and validation of network infrastructure
  • Stand up and operate SRE-grade telemetry and observability
  • Lead network debugging in large distributed-systems environments