Performance Engineer

Anthropic Anthropic · AI Frontier · New York, NY +2 · Remote · AI Research & Engineering

This role focuses on optimizing the performance, throughput, and robustness of large-scale distributed machine learning systems. The engineer will identify and solve novel systems problems, implement low-latency sampling, adapt models for low-precision inference, optimize serving efficiency, and design fault-tolerant distributed systems. While not directly building ML models, the role is critical for enabling ML algorithms to run efficiently at scale.

What you'd actually do

  1. identifying these problems, and then developing systems that optimize the throughput and robustness of our largest distributed systems
  2. Implement low-latency high-throughput sampling for large language models
  3. Implement GPU kernels to adapt our models to low-precision inference
  4. Write a custom load-balancing algorithm to optimize serving efficiency
  5. Design and implement a fault-tolerant distributed system running with a complex network topology

Skills

Required

  • significant software engineering or machine learning experience
  • large-scale systems problems
  • low-latency high-throughput sampling
  • low-precision inference
  • load-balancing algorithm
  • fault-tolerant distributed system

Nice to have

  • High performance, large-scale ML systems
  • GPU/Accelerator programming
  • ML framework internals
  • OS internals
  • Language modeling with transformers

What the JD emphasized

  • solving large-scale systems problems
  • supercomputing scale
  • High performance, large-scale ML systems
  • GPU/Accelerator programming
  • ML framework internals
  • OS internals
  • Language modeling with transformers

Other signals

  • optimize throughput and robustness of distributed systems
  • ML algorithms at scale
  • low-latency high-throughput sampling
  • low-precision inference
  • optimize serving efficiency
  • fault-tolerant distributed system