Software Development Engineer in Test (cloud)

Cerebras · Semiconductors · India · Software

Software Development Engineer in Test (Cloud) for Cerebras, focusing on quality ownership and building scalable test infrastructure for their AI Inference Cloud platform, which utilizes their large-scale AI chip for training and inference.

What you'd actually do

  1. Drive weekly cloud release qualification end to end. Read every PR in the release branch first-hand; understand what changed; decide where the risk is; and design the qualification that exercises the actual risk. Be the final voice before a release ships.
  2. Build and evolve the test infrastructure - functional, integration, performance, and fault for the Inference Cloud platform. Plan for 20x growth in coverage, environments, and traffic. Today's setup will not survive tomorrow's load; design for the next horizon.
  3. Reason through the full stack — client SDK, API, gateway, inference software, driver, hardware. Know enough to debug from any layer and to test the right thing.
  4. Read and review developer PRs with genuine understanding of what each change does and what its blast radius is. Test the change's actual impact, not its surface area.
  5. Increase automation coverage continuously. Fix flaky tests rather than tolerate them. Use AI tooling effectively to accelerate test creation, debugging, and analysis.

Skills

Required

  • 5+ years of experience in quality engineering, test engineering, or a closely related role
  • substantial individual contributor experience on large-scale distributed systems or cloud infrastructure
  • Deep cloud platform experience, preferably AWS - networking, compute orchestration, container platforms, and multi-region production services
  • Track record of building scalable test infrastructure - frameworks, harnesses, environments, and automation that scale with the system under test rather than fighting it
  • Strong systems debugging and reasoning
  • Strong proficiency in at least one backend language (Python, Go, or C++)
  • Excellent written and async communication
  • Self-direction under ambiguity

Nice to have

  • Experience with Cloud infrastructure, model serving systems, or GPU accelerated workloads is a strong plus
  • Experience using AI tooling (LLMs, coding assistants, agents) to accelerate test development, triage, or analysis is a plus

What the JD emphasized

  • build scalable test infrastructure
  • Cloud infrastructure
  • model serving systems
  • GPU accelerated workloads

Other signals

  • AI chip
  • AI compute power
  • training and inference speeds
  • large-scale ML applications
  • Generative AI inference solution
  • agentic computation
  • Inference Cloud platform
  • model serving systems
  • GPU accelerated workloads