Engineering Manager, Inference Routing and Performance

Anthropic Anthropic · AI Frontier · San Francisco, CA · AI Research & Engineering

Engineering Manager for Anthropic's Inference Routing and Performance team, responsible for the cluster-level routing and coordination plane for the company's inference fleet. The role focuses on optimizing throughput and efficiency of AI model serving through custom algorithms, quantitative modeling, and deep systems understanding.

What you'd actually do

  1. Own the technical roadmap for cluster-level inference efficiency — routing decisions, cache placement and eviction, cross-replica coordination, and the protocols that keep routing and inference engines in sync
  2. Partner with the inference engine, kernels, and performance teams to identify fleet-level throughput and latency wins, then turn those into shipped improvements with measurable results
  3. Build the team's habit of quantitative performance modeling: claim a win only when you can measure it, and know before you ship what the expected effect is
  4. Set technical strategy for how routing evolves across heterogeneous hardware (GPUs, TPUs, Trainium) and across all our serving surfaces
  5. Run the team's operational backbone — on-call rotation, incident response, postmortem review, deploy safety — so the team can ship aggressively without the system becoming fragile

Skills

Required

  • Engineering management experience
  • Leading teams on critical-path production infrastructure at scale
  • Deep systems background (load balancing, scheduling, cache-coherent distributed state, high-performance networking)
  • Architectural decision-making
  • Evaluating technical candidates
  • Quantitative performance modeling
  • Production infrastructure operations (on-call, incident response, capacity events, deploy discipline)
  • Results-orientation and bias toward impact

Nice to have

  • Experience with heterogeneous hardware (GPUs, TPUs, Trainium)
  • Understanding of kernel and framework level optimizations

What the JD emphasized

  • shipping system-level performance improvements
  • running the team operationally
  • deep systems background
  • shipped performance improvements in large-scale systems
  • run production infrastructure with real operational stakes

Other signals

  • inference routing
  • performance optimization
  • distributed systems
  • large-scale infrastructure