Member of Technical Staff - Sandbox Service

xAI xAI · AI Frontier · London, United Kingdom · Infrastructure

The Sandbox service team at xAI builds and maintains a secure, scalable system that gives AI models safe, controlled access to computational environments. This infrastructure powers critical workloads across training and product, enabling models to run code, build software, interact with tools, and control applications. The role involves provisioning containers and virtual machines on large-scale clusters, granting models interactive control over these remote environments, and spans full-stack development from job orchestration and resource scheduling to filesystem performance tuning. It supports real-time code execution for products like Grok and reinforcement learning in training where models explore various tools.

What you'd actually do

  1. build and maintain a secure, scalable system that gives our models safe, controlled access to computational environments
  2. provision containers and virtual machines on large-scale clusters, granting models interactive control over these remote environments
  3. orchestrate massive jobs and resource scheduling at the cluster level
  4. fine-tune filesystem performance on nodes
  5. enable Grok to safely run and test code in real-time for user queries, and support reinforcement learning in training

Skills

Required

  • Rust
  • C++
  • Go
  • Python
  • Linux systems
  • Windows systems
  • virtualisation technologies
  • containerisation technologies
  • cgroups
  • KVM
  • gVisor
  • QEMU
  • networking stack

Nice to have

  • both Linux and Windows systems

What the JD emphasized

  • secure
  • safe
  • controlled access
  • run code
  • interact with tools
  • control applications
  • interactive control
  • safely run and test code
  • interactively explore tools

Other signals

  • enabling models to run code
  • interact with tools
  • control applications
  • provision containers and virtual machines
  • orchestrating massive jobs
  • resource scheduling
  • fine-tuning filesystem performance
  • safely run and test code in real-time
  • supports reinforcement learning in training
  • interactively explore tools