Senior Software Engineer, ML Networking

Google Google · Big Tech · Bengaluru, Karnataka, India

This role focuses on building and implementing ML networking infrastructure for Google Cloud, specifically enhancing GPU-to-GPU communication and enabling ML workloads. While it supports ML, the core craft is in networking infrastructure and engineering, not direct AI/ML model development.

What you'd actually do

  1. Understand capabilities provided by series of ConnectX (CX) network interface cards (NICs).
  2. Design features that integrate graphics processing unit (GPU)-to-GPU communication capabilities into the Google Cloud infrastructure.
  3. Code and implement the features that enable GPU-to-GPU communication on virtual machine (VM) families.
  4. Deliver virtual machine learning (ML) networking infrastructure enabling ML workloads to run in Google Cloud Platform (GCP).
  5. Enable GPU remote direct memory access (RDMA) networking for VMs and bare metal by exposing the NICs.

Skills

Required

  • 5 years of experience with one or more general purpose programming languages including but not limited to: Java, C/C++, Python, or Go.

Nice to have

  • software architecture
  • software engineering
  • networking protocols
  • network virtualization
  • networking