Staff+ Software Engineer, Inference Runtime

Anthropic Anthropic · AI Frontier · San Francisco, CA · Software Engineering - Infrastructure

Staff+ Software Engineer for Anthropic's Inference Runtime team, focusing on the accelerator-agnostic core of their AI inference serving stack. The role involves setting technical direction, owning the architecture and roadmap, hands-on coding in Rust/Python, optimizing accelerator usage, and building validation systems. Requires deep systems engineering or ML infrastructure background with experience in performance optimization and large-scale distributed systems.

What you'd actually do

  1. Set technical direction for the team, owning the architecture and roadmap for the shared runtime of the inference serving stack
  2. Own and evolve the accelerator-agnostic runtime itself – its interfaces, internal boundaries, and build structure – including hands-on work in a performance-sensitive Rust and Python codebase
  3. Keep the platform's expansion cost low by ensuring new models and deployment targets pay only for their own specialization, and edge cases stitch back into the core easily
  4. Drive efficient accelerator usage – utilization, scheduling, memory management – across GPU, TPU, and Trainium
  5. Build the runtime's validation surface around partitioned builds, change-scoped testing, and canary/shadow/rollback as first-class mechanisms

Skills

Required

  • Deep background in systems engineering or ML infrastructure
  • Ability to go hands-on with performance profiling, latency and throughput optimization, and systems debugging at scale
  • Real depth in at least one accelerator ecosystem (CUDA/GPU, TPU, or Trainium/AWS Neuron)
  • Genuine appetite to keep the runtime agnostic across all of them
  • Significant software engineering experience
  • Strong background in high-performance, large-scale distributed systems serving millions of users
  • A track record of defining and using engineering metrics to drive improvement
  • Experience driving technical alignment across organizational boundaries
  • Advocating for your team's needs while contributing to shared infrastructure
  • Strong written and verbal communication
  • Ability to influence technical direction without formal authority

Nice to have

  • 8+ years of software engineering experience
  • Significant time as the technical lead or anchor on a platform, inference runtime, or ML infrastructure team
  • Experience with ML compiler toolchains (XLA, Triton, NeuronX) or accelerator driver/firmware management at scale
  • Background operating production as a validation surface at scale: shadow traffic, canary populations, automated baseline comparison, fast rollback
  • Experience with deterministic or simulation-based testing for hardware-dependent systems
  • Experience with CI/CD systems at scale, particularly for workloads involving accelerator hardware
  • Familiarity with Kubernetes-based development and job scheduling environments
  • Prior tech lead experience on a developer productivity or platform engineering team at a fast-growing AI/ML company

What the JD emphasized

  • technical lead
  • broad technical ownership
  • technical roadmap
  • technical anchor of a platform with many internal consumers
  • performance-sensitive Rust and Python codebase
  • efficient accelerator usage
  • validation surface
  • technical counterpart to Anthropic's technical infrastructure org
  • compilers, build systems, and toolchains
  • performance profiling, latency and throughput optimization, and systems debugging at scale
  • track record of defining and using engineering metrics to drive improvement
  • set SLOs on platform surfaces
  • driven escape rates, release times, latency, or throughput in a measurable direction
  • technical alignment across organizational boundaries
  • technical lead or anchor on a platform, inference runtime, or ML infrastructure team
  • ML compiler toolchains
  • production as a validation surface at scale
  • deterministic or simulation-based testing for hardware-dependent systems
  • CI/CD systems at scale

Other signals

  • inference runtime
  • accelerator-agnostic core
  • performance, correctness, and abstractions
  • technical lead
  • broad technical ownership
  • set technical direction
  • architecture and roadmap
  • hands-on work
  • performance-sensitive Rust and Python codebase
  • efficient accelerator usage
  • validation surface
  • central Infrastructure org
  • compilers, build systems, and toolchains
  • ML infrastructure
  • performance profiling, latency and throughput optimization
  • systems debugging at scale
  • accelerator ecosystem (CUDA/GPU, TPU, or Trainium/AWS Neuron)
  • high-performance, large-scale distributed systems serving millions of users
  • engineering metrics to drive improvement
  • SLOs on platform surfaces
  • driving escape rates, release times, latency, or throughput
  • technical alignment across organizational boundaries
  • developer productivity or platform engineering team
  • fast-growing AI/ML company