Staff + Sr. Software Engineer, Cloud Inference Launch Engineering

Anthropic Anthropic · AI Frontier · New York, NY +2 · Software Engineering - Infrastructure

Staff + Sr. Software Engineer role focused on scaling and optimizing Claude's inference on cloud platforms (AWS, GCP, Azure). The role involves owning the end-to-end product of Claude on each cloud, including API integration, request routing, inference execution, capacity management, and day-to-day operations. Key responsibilities include validating inference server and load balancer changes, ensuring correctness, performance, and reliability across platforms, and driving down cycle times for model launches and feature integrations. The role requires strong software engineering experience in distributed systems and experience with cloud platforms, with a focus on building automation and test infrastructure for inference services.

What you'd actually do

  1. Be on the critical path for frontier model launches, bringing up inference for new model architectures and shipping them to cloud platforms in lockstep with our first-party platform
  2. Work with the core inference team to bring new inference features (e.g. structured sampling, prompt caching, and more) to cloud platforms, owning the platform-specific integration that gets them to production
  3. Identify and dive deep on the gaps that make inference behave differently across first-party and CSPs — config drift, observability, deployment patterns, hard cross-platform bugs — and fix them at the source rather than building platform-specific workarounds
  4. Design, build, and own the CI/CD infrastructure for the inference server and load balancer across cloud platforms, with shadow traffic, performance baselines (throughput and latency), and correctness checks that catch regressions before production
  5. Drive down merge-to-production cycle time by making validation faster, more parallel, and cost-effective enough to run on the same constrained accelerator pool that serves customers, without trading away reliability
  6. Analyze observability data across providers to identify performance bottlenecks, cost anomalies, and regressions, and drive remediation based on real-world production workloads

Skills

Required

  • significant software engineering experience
  • strong background in high-performance, large-scale distributed systems serving millions of users
  • track record of building automation or test infrastructure that measurably improved release velocity or reliability
  • experience building or operating services on at least one major cloud platform (AWS, GCP, or Azure)
  • exposure to Kubernetes, Infrastructure as Code, or container orchestration

Nice to have

  • LLM inference optimization, batching, and caching strategies
  • Capacity-constrained scheduling or shared-resource test infrastructure
  • Solid understanding of multi-region deployments, request routing, load balancing, global traffic management
  • Working with CSP partner teams to scale infrastructure across multiple platforms, navigating differences in networking, security, privacy, and managed service
  • Proficiency in Python or Rust

What the JD emphasized

  • critical path
  • correctness, performance, and reliability
  • fast and cheap enough
  • consistent enough
  • scarcest resource
  • critical path
  • correctness checks
  • catch regressions before production
  • merge-to-production cycle time
  • cost-effective
  • constrained accelerator pool
  • performance bottlenecks
  • cost anomalies
  • regressions

Other signals

  • inference
  • serving
  • cloud platforms
  • CI/CD
  • performance
  • reliability