Distributed Systems Engineer (l5 + L6), Compute Runtime

Netflix Netflix · Big Tech · United States · Remote · Engineering

Netflix is seeking a Distributed Systems Engineer to develop and maintain the software that runs on their compute fleet, focusing on the Kubernetes data plane, container runtime, and underlying OS. The role involves architecting solutions, contributing to open-source projects, and debugging performance issues for large-scale cloud infrastructure that supports various workloads including AI/ML.

What you'd actually do

  1. Build and maintain the software that runs our Kubernetes container orchestration platform
  2. Architect and design innovative solutions to support new workloads and features, and improve the reliability and performance of existing workloads
  3. Develop and maintain Kubernetes and containerd customizations and plugins
  4. Contribute to the upstream containerd and Kubernetes projects
  5. Debug performance and operational problems observed with container workloads

Skills

Required

  • Kubernetes
  • containerd
  • runc
  • NRI plugins
  • Linux debugging
  • distributed systems design
  • Go
  • Java
  • C/C++
  • networking concepts
  • TCP
  • IPv4
  • sockets
  • host and service networking in a containerized environment

Nice to have

  • Open source contributions
  • Linux kernel development
  • AI/ML workload compute infrastructure management
  • networking concepts

What the JD emphasized

  • Minimum of 5 years of experience evolving Compute infrastructure for a large organization
  • Experience supporting containers and related runtimes as a service
  • Experience debugging system performance issues in a Linux environment
  • Experience designing large-scale distributed systems, preferably a compute orchestration system like Kubernetes
  • Proficiency in Go, Java, or C/C++
  • Understanding of networking concepts