Engineering Manager, Cloud Safety

Anthropic Anthropic · AI Frontier · San Francisco, CA · Software Engineering - Infrastructure

Engineering Manager to lead the Cloud Safety team, responsible for scaling and optimizing Claude's serving infrastructure across Cloud Service Providers (CSPs). The role involves owning end-to-end safety, including API, inference, classifiers, fraud detection, data management, and operations, to ensure safe usage and enable the launch of new models and features at scale.

What you'd actually do

  1. Set technical strategy and oversee development of safety features across all Claude CSP surfaces
  2. Collaborate across teams and companies to deeply understand product, infrastructure, operations and capacity needs, identifying potential solutions to support frontier LLM serving
  3. Work closely with cross-functional stakeholders across companies to align on goals and drive outcomes
  4. Create clarity for the team and stakeholders in an ambiguous and evolving environment
  5. Take an inclusive approach to hiring and coaching top technical talent, and support a high performing team

Skills

Required

  • 8+ years of experience in high-scale, high-reliability software development, particularly infrastructure or capacity management
  • 3+ years of engineering management experience
  • Experience recruiting, scaling, and retaining engineering talent in a high growth environment
  • Experience scaling products, resources and operations to accommodate rapid growth
  • Deep interest in advanced AI systems and commitment to their safe development
  • Building strong relationships and strategy with stakeholders across engineering, product, finance, and policy
  • Working with external partners to align goals and deliver impact
  • Working in a fast-paced, early environment; comfortable with adapting priorities
  • Excellent written and verbal communication skills
  • Building a culture of belonging and engineering excellence
  • Developing AI responsibly and safely

Nice to have

  • Experience with identity and access controls — KYC/KYB, verification flows, tiered access
  • Experience with detecting and mitigating platform abuse at scale — account fraud, bot traffic, model extraction
  • Experience with trust & safety, security, and/or legal/compliance
  • Experience or interest in machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL
  • Experience with deployment and capacity management automation

What the JD emphasized

  • scale and optimize Claude to serve the massive audiences of developers and enterprise companies using Claude via Cloud Service Providers (CSPs)
  • own safety end-to-end for these services, including API, inference, classifiers, fraud detection, data management and operations
  • ensure our Claude is used safely on all CSPs
  • increase the scale at which Anthropic operates and accelerate our ability to reliably launch new frontier models and innovative features to customers across all platforms
  • Have 8+ years of experience in high-scale, high-reliability software development, particularly infrastructure or capacity management
  • Have 3+ years of engineering management experience
  • Experience recruiting, scaling, and retaining engineering talent in a high growth environment
  • Have experience scaling products, resources and operations to accommodate rapid growth
  • Are deeply interested in the potential transformative effects of advanced AI systems and are committed to ensuring their safe development
  • Excel at building strong relationships and strategy with stakeholders across engineering, product, finance, and policy
  • Have experience working with external partners to align goals and deliver impact
  • Enjoy working in a fast-paced, early environment; comfortable with adapting priorities as driven by the rapidly evolving AI space
  • Demonstrated success building a culture of belonging and engineering excellence
  • Are motivated by developing AI responsibly and safely
  • Experience with detecting and mitigating platform abuse at scale — account fraud, bot traffic, model extraction
  • Experience with trust & safety, security, and/or legal/compliance
  • Experience or interest in machine learning infrastructure like GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL
  • Experience with deployment and capacity management automation

Other signals

  • Scale Claude to serve massive audiences of developers and enterprise companies
  • Own safety end-to-end for these services, including API, inference, classifiers, fraud detection, data management and operations
  • Ensure Claude is used safely on all CSPs
  • Increase the scale at which Anthropic operates and accelerate our ability to reliably launch new frontier models and innovative features