Staff Software Engineer, Inference

Anthropic Anthropic · AI Frontier · London, United Kingdom · Software Engineering - Infrastructure

Staff Software Engineer on the Inference team responsible for building and maintaining systems that serve Claude to millions of users. Focuses on maximizing compute efficiency and enabling research through high-performance inference infrastructure, involving distributed systems, request routing, and LLM inference optimization.

What you'd actually do

  1. work end to end, identifying and addressing key infrastructure blockers to serve Claude to millions of users while enabling breakthrough AI research
  2. maximizing compute efficiency to serve our explosive customer growth
  3. enabling breakthrough research by giving our scientists the high-performance inference infrastructure they need to develop next-generation models
  4. tackle complex, distributed systems challenges across multiple accelerator families and emerging AI hardware running in multiple cloud platforms
  5. Designing intelligent routing algorithms that optimize request distribution across thousands of accelerators

Skills

Required

  • significant software engineering experience, particularly with distributed systems
  • performance optimization
  • distributed systems
  • large-scale service orchestration
  • intelligent request routing
  • Python or Rust

Nice to have

  • Familiarity with LLM inference optimization, batching strategies, and multi-accelerator deployments
  • High-performance, large-scale distributed systems
  • Implementing and deploying machine learning systems at scale
  • Load balancing, request routing, or traffic management systems
  • LLM inference optimization, batching, and caching strategies
  • Kubernetes and cloud infrastructure (AWS, GCP)
  • Python or Rust

What the JD emphasized

  • critical systems that serve Claude to millions of users worldwide
  • industry's largest compute-agnostic inference deployments
  • fleet-wide orchestration across diverse AI accelerators
  • maximizing compute efficiency
  • explosive customer growth
  • enabling breakthrough research
  • high-performance inference infrastructure
  • next-generation models
  • complex, distributed systems challenges
  • multiple accelerator families
  • emerging AI hardware
  • multiple cloud platforms
  • performance optimization
  • distributed systems
  • large-scale service orchestration
  • intelligent request routing
  • LLM inference optimization
  • batching strategies
  • multi-accelerator deployments
  • High-performance, large-scale distributed systems
  • Implementing and deploying machine learning systems at scale
  • Load balancing, request routing, or traffic management systems
  • LLM inference optimization, batching, and caching strategies
  • Autoscaling our compute fleet
  • dynamically match supply with demand
  • production, research, and experimental workloads
  • production-grade deployment pipelines
  • releasing new models to millions of users
  • Integrating new AI accelerator platforms
  • hardware-agnostic competitive advantage
  • new inference features
  • structured sampling
  • prompt caching
  • Supporting inference for new model architectures
  • Analyzing observability data
  • tune performance
  • real-world production workloads
  • Managing multi-region deployments
  • geographic routing
  • global customers

Other signals

  • serving Claude to millions of users
  • maximizing compute efficiency
  • enabling breakthrough research
  • large-scale service orchestration
  • LLM inference optimization