Staff Software Engineer, Inference

Anthropic Anthropic · AI Frontier · AI Research & Engineering

Staff Software Engineer on the Inference team responsible for building and maintaining systems that serve Claude to millions of users. Focuses on maximizing compute efficiency and enabling research through high-performance inference infrastructure, tackling distributed systems challenges across diverse AI accelerators and cloud platforms.

What you'd actually do

  1. building and maintaining the critical systems that serve Claude to millions of users worldwide
  2. maximizing compute efficiency to serve our explosive customer growth
  3. enabling breakthrough research by giving our scientists the high-performance inference infrastructure they need to develop next-generation models
  4. identifying and addressing key infrastructure blockers to serve Claude to millions of users while enabling breakthrough AI research
  5. work end to end

Skills

Required

  • significant software engineering experience, particularly with distributed systems
  • performance optimization
  • distributed systems
  • large-scale service orchestration
  • intelligent request routing
  • Python or Rust

Nice to have

  • LLM inference optimization
  • batching strategies
  • multi-accelerator deployments
  • Implementing and deploying machine learning systems at scale
  • Load balancing, request routing, or traffic management systems
  • LLM inference optimization, batching, and caching strategies
  • Kubernetes and cloud infrastructure (AWS, GCP)

What the JD emphasized

  • critical systems that serve Claude to millions of users worldwide
  • maximizing compute efficiency
  • enabling breakthrough research
  • high-performance inference infrastructure
  • complex, distributed systems challenges
  • multiple accelerator families
  • emerging AI hardware
  • multiple cloud platforms
  • performance optimization
  • distributed systems
  • large-scale service orchestration
  • intelligent request routing
  • LLM inference optimization
  • batching strategies
  • multi-accelerator deployments
  • High-performance, large-scale distributed systems
  • Implementing and deploying machine learning systems at scale
  • Load balancing, request routing, or traffic management systems
  • LLM inference optimization, batching, and caching strategies
  • Kubernetes and cloud infrastructure (AWS, GCP)
  • significant software engineering experience, particularly with distributed systems
  • technical excellence directly drives both business results and research breakthroughs
  • Designing intelligent routing algorithms that optimize request distribution across thousands of accelerators
  • Autoscaling our compute fleet to dynamically match supply with demand across production, research, and experimental workloads
  • Building production-grade deployment pipelines for releasing new models to millions of users
  • Integrating new AI accelerator platforms to maintain our hardware-agnostic competitive advantage
  • Contributing to new inference features (e.g., structured sampling, prompt caching)
  • Supporting inference for new model architectures
  • Analyzing observability data to tune performance based on real-world production workloads
  • Managing multi-region deployments and geographic routing for global customers

Other signals

  • serving Claude to millions of users worldwide
  • maximizing compute efficiency
  • enabling breakthrough research
  • large-scale service orchestration
  • LLM inference optimization