Software Engineer, Sandboxing (systems)

Anthropic Anthropic · AI Frontier · San Francisco, CA · Software Engineering - Infrastructure

Software Engineer focused on optimizing virtualization and VM workloads for AI infrastructure, specifically for training and serving AI models. The role involves Linux kernel development, system programming, and performance optimization of virtualized environments to enhance compute efficiency and reliability.

What you'd actually do

  1. Optimize our virtualization stack, improving performance, reliability, and efficiency of our VM environments
  2. Design and implement kernel modules, drivers, and system-level components to enhance our compute infrastructure
  3. Investigate and resolve performance bottlenecks in virtualized environments
  4. Collaborate with cloud engineering teams to optimize interactions between our workloads and underlying hardware
  5. Develop tooling for monitoring and improving virtualization performance

Skills

Required

  • Linux kernel development
  • system programming
  • virtualization technologies (KVM, Xen, QEMU, etc.)
  • system performance optimization
  • C/C++ programming
  • Linux resource management
  • Linux scheduling
  • Linux memory management
  • profiling
  • debugging system-level performance issues

Nice to have

  • Rust programming
  • modern CPU architectures
  • memory systems
  • GPU virtualization
  • cloud infrastructure at scale (AWS, GCP)
  • container technologies
  • eBPF programming
  • kernel tracing tools
  • OS-level security hardening
  • isolation techniques
  • custom scheduling algorithms
  • performance optimization for ML/AI specific workloads
  • network stack optimization
  • high-performance networking
  • TPUs
  • custom ASICs
  • ML accelerators

What the JD emphasized

  • Linux OS and System Programming Subject Matter Expert
  • low-level system programming
  • kernel optimization
  • virtualization technologies
  • Linux kernel development
  • system programming
  • virtualization technologies
  • system performance
  • CPU architectures
  • memory systems
  • C/C++ programming skills
  • systems languages like Rust
  • Linux resource management
  • scheduling
  • memory management
  • profiling
  • debugging system-level performance issues
  • ML/AI specific workloads