Staff + Sr. Software Engineer, Inference

Anthropic Anthropic · AI Frontier · New York, NY +2 · Software Engineering - Infrastructure

The Inference team at Anthropic is responsible for building and maintaining the systems that serve Claude to millions of users. This involves managing the entire stack from request routing to fleet-wide orchestration across diverse AI accelerators, with a dual mandate of maximizing compute efficiency and enabling research breakthroughs. The role requires significant software engineering experience, particularly with distributed systems, and experience with LLM inference optimization.

What you'd actually do

  1. Designing intelligent routing algorithms that optimize request distribution across thousands of accelerators
  2. Autoscaling our compute fleet to dynamically match supply with demand across production, research, and experimental workloads
  3. Building production-grade deployment pipelines for releasing new models to millions of users
  4. Integrating new AI accelerator platforms to maintain our hardware-agnostic competitive advantage
  5. Contributing to new inference features (e.g., structured sampling, prompt caching)

Skills

Required

  • significant software engineering experience
  • distributed systems
  • Python
  • Rust

Nice to have

  • High-performance, large-scale distributed systems
  • Implementing and deploying machine learning systems at scale
  • Load balancing, request routing, or traffic management systems
  • LLM inference optimization, batching, and caching strategies
  • Kubernetes and cloud infrastructure (AWS, GCP, Azure)

What the JD emphasized

  • critical systems that serve Claude to millions of users worldwide
  • maximizing compute efficiency
  • explosive customer growth
  • enabling breakthrough research
  • high-performance inference infrastructure
  • complex, distributed systems challenges
  • multiple accelerator families and emerging AI hardware
  • multiple cloud platforms
  • significant software engineering experience, particularly with distributed systems
  • implementing and deploying machine learning systems at scale
  • LLM inference optimization, batching, and caching strategies

Other signals

  • serving Claude to millions of users worldwide
  • maximizing compute efficiency
  • enabling breakthrough research
  • high-performance inference infrastructure