Software Engineer Ii, Google Cloud Platform, Infrastructure

Google Google · Big Tech · Kraków, Poland +1

Software Engineer II for Google Cloud Platform, focusing on the Kueue project, a CNCF open-source scheduler for AI workloads on Kubernetes. The role involves designing and building sophisticated scheduling solutions for accelerators (TPUs, GPUs) and large-scale CPU workloads, with a focus on algorithm design, system performance, and Kubernetes internals. The team also pioneers the use of Agentic AI for code development and issue diagnosis.

What you'd actually do

  1. Develop and enhance the Kueue open-source project, focusing on core scheduling algorithms, queueing mechanisms, and overall performance.
  2. Pioneer the use of Agentic AI to assist in Kueue code development and to diagnose complex scheduling issues.
  3. Implement support for accelerators, ensuring efficient and high-performance scheduling for hardware like TPU7X, TPU8, and Nvidia GB200/GB300, as well as large-scale CPU workloads.
  4. Innovate on advanced scheduling concepts such as topology-aware scheduling to optimize network locality and elastic workload support for dynamic scaling.
  5. Integrate Kueue with popular AI/ML frameworks like Pathways and Ray. Engage actively with the Kubernetes and CNCF open-source communities to drive the direction of AI workload scheduling.

Skills

Required

  • software development
  • Python
  • C
  • C++
  • Java
  • JavaScript
  • large-scale infrastructure
  • distributed systems
  • data structures
  • algorithms

Nice to have

  • Go
  • concurrent programming

What the JD emphasized

  • Kueue
  • Kubernetes
  • AI/ML
  • scheduling
  • accelerators
  • TPUs
  • GPUs
  • open-source

Other signals

  • Kubernetes
  • AI/ML scheduling
  • open-source project