Software Engineer, Inference Scalability and Capability

Anthropic Anthropic · AI Frontier · AI Research & Engineering

Software Engineer focused on building and scaling inference systems for LLMs, optimizing compute efficiency, and developing new inference capabilities. This role involves complex distributed systems challenges across the inference stack, including request routing and prompt caching.

What you'd actually do

  1. Optimizing inference request routing to maximize compute efficiency
  2. Autoscaling our compute fleet to effectively match compute supply with inference demand
  3. Contributing to new inference features (e.g. structured sampling, fine tuning)
  4. Supporting inference for new model architectures
  5. Ensuring smooth and regular deployment of inference services

Skills

Required

  • significant software engineering experience
  • high performance, large-scale distributed systems
  • implementing and deploying machine learning systems at scale
  • Python

Nice to have

  • LLM optimization batching and caching strategies
  • Kubernetes

What the JD emphasized

  • high performance, large-scale distributed systems
  • implementing and deploying machine learning systems at scale
  • LLM optimization batching and caching strategies

Other signals

  • building and maintaining critical systems that serve LLMs
  • scaling inference systems
  • optimizing compute resource efficiency
  • developing new inference capabilities
  • distributed systems challenges across inference stack
  • optimal request routing
  • efficient prompt caching