Senior Software Engineer - Accelerated Kubernetes Runtime Team

NVIDIA NVIDIA · Semiconductors · WA +1 · Remote

NVIDIA is seeking a Senior Software Engineer to join their Accelerated Kubernetes Runtime team. The role involves designing and building automation systems for managing GPU-accelerated Kubernetes runtime distributions, focusing on seamless installation, upgrade, and management of cluster runtime packages for AI accelerators. The engineer will develop controller systems to optimize runtime components for latest GPU architectures, ensuring reliable and performant infrastructure for AI researchers and developers.

What you'd actually do

  1. Design and implement runtime features that orchestrate the lifecycle of runtime components across thousands of Kubernetes clusters without manual intervention
  2. Build and maintain the systems that configure, package, validate, and distribute accelerated compute components
  3. Develop Kubernetes controllers, CRDs, and operators that automate runtime installation, upgrade, and rollback operations with API driven workflows

Skills

Required

  • Bachelors in Computer Science, or equivalent experience
  • 8+ years of professional experience
  • 3 years of experience with Kubernetes development
  • Kubernetes controllers
  • Kubernetes operators
  • CustomResourceDefinitions
  • Go
  • scalable Go services
  • complex distributed systems
  • Helm
  • Kustomize
  • Kubernetes manifest packaging and templating
  • automation systems

Nice to have

  • Experience working with NVIDIA Kubernetes components such as GPU operator, device plugins, or other HPC components in large scale production environments
  • Deep familiarity with OCI registries, artifact signing, SBOM generation, and supply chain security practices
  • Experience building multi-tenant platform services with focus on API design, versioning, and backward compatibility
  • Track record of migrating legacy systems to modern, automated platforms while maintaining zero-downtime operations
  • contributions to upstream Kubernetes/CNCF projects
  • experience extending Kubernetes API machinery
  • Deep understanding of Kubernetes architecture including API machinery, admission controllers, and resource lifecycle management

What the JD emphasized

  • 8+ years of professional experience, with at least 3 years of experience with Kubernetes development
  • Experience building production Kubernetes systems with significant expertise in controllers, operators, and CustomResourceDefinitions
  • Strong proficiency in Go and experience building scalable Go services that manage complex distributed systems
  • Demonstrated ability to design and implement automation systems that replace manual processes with reliable, self-service tooling