Os / K8s Systems Engineer

Baseten · Data AI · San Francisco, CA · EPD

Baseten is seeking an OS / K8s Systems Engineer to build and automate the infrastructure that turns raw GPU hardware into production-ready compute for AI companies. This role focuses on the software layer for reproducible, scalable, and reliable infrastructure across data centers, including OS images, provisioning pipelines, and cluster orchestration.

What you'd actually do

  1. Own the end-to-end automation of cluster bring-up and lifecycle management.
  2. Build and maintain OS images, provisioning systems, and configuration pipelines.
  3. Deploy and operate cluster orchestration platforms (Kubernetes, Slurm, or similar).
  4. Design systems for reproducibility across sites and hardware generations.
  5. Automate upgrades, rollouts, and failure recovery.

Skills

Required

  • Experience building and operating automated infrastructure systems.
  • Strong programming skills (e.g., Python, Go, or similar).
  • Deep familiarity with Linux systems, including boot processes, drivers, and performance.
  • Experience with provisioning systems (PXE, imaging, configuration management).
  • Experience with Kubernetes.
  • Strong debugging skills across system layers (hardware → OS → network).

Nice to have

  • Experience working with GPU or high-performance workloads is a plus.

What the JD emphasized

  • building systems not operating them