Software Engineer, ML Performance and Scaling

Anthropic Anthropic · AI Frontier · AI Research & Engineering

Software Engineer focused on optimizing the throughput and robustness of large-scale distributed ML systems, requiring expertise in performance engineering and a willingness to learn ML.

What you'd actually do

  1. identifying novel systems problems
  2. developing systems that optimize the throughput and robustness of our largest distributed systems

Skills

Required

  • software engineering
  • machine learning
  • large-scale distributed systems
  • performance optimization
  • systems programming

Nice to have

  • GPU/Accelerator programming
  • ML framework internals
  • OS internals
  • language modeling with transformers

What the JD emphasized

  • significant software engineering or machine learning experience, particularly at supercomputing scale
  • High performance, large-scale ML systems

Other signals

  • optimize throughput and robustness of largest distributed systems
  • identify novel systems problems
  • ML performance and scaling