Engineering Manager, Inference Scalability and Capability

Anthropic Anthropic · AI Frontier · AI Research & Engineering

Engineering Manager for Inference Scalability and Capability team, responsible for building and maintaining critical systems that serve LLMs, focusing on scaling inference, ensuring reliability, optimizing compute, and developing new inference capabilities. Manages a team of engineers, drives operational excellence, facilitates advanced inference features, and partners with research, infrastructure, and product teams.

What you'd actually do

  1. Build and lead a high-performing team of engineers through technical mentorship, strategic hiring, and creating an environment that fosters innovation
  2. Drive operational excellence of inference systems (deployments, auto-scaling, request routing, monitoring) across cloud providers
  3. Facilitate development of advanced inference features (e.g., prompt caching, constrained sampling, fine-tuning)
  4. Partner deeply with research teams to productionize new models, infrastructure teams to optimize hardware utilization, and product teams to deliver customer-facing features
  5. Create clear technical roadmaps and execution strategies in a fast-moving environment while managing competing priorities

Skills

Required

  • 5+ years of experience leading large-scale distributed systems teams
  • building high-trust environments
  • recruiting, scaling, and retaining engineering talent
  • outstanding communication and leadership skills
  • deep commitment to advancing AI capabilities responsibly
  • strong technical background enabling architectural decisions and guiding technical direction

Nice to have

  • Implementing and deploying machine learning systems at scale
  • LLM inference optimization including batching and caching strategies
  • Cloud-native architectures, containerization, and deployment across multiple cloud providers
  • High-performance computing environments and hardware acceleration (GPU, TPU, Trn)

What the JD emphasized

  • productionize new models
  • LLM inference optimization
  • advanced inference features

Other signals

  • scaling inference systems
  • optimizing compute resource efficiency
  • developing new inference capabilities
  • productionize new models
  • LLM inference optimization