Software Engineer Iii, Ai/ml Infrastructure, Tiles

Google Google · Big Tech · Sunnyvale, CA +1

Software Engineer III role focused on AI/ML Infrastructure within Google's Top-of-Rack (ToR) Infrastructure team. The role involves connecting servers for Google's global services, Cloud, and AI/ML platforms to the network, managing ToR switch infrastructure, and life-cycle engineering. Responsibilities include developing software services to orchestrate the network for next-generation hardware and AI/ML platforms, contributing to network models, building automation for capacity provisioning, enhancing ToR infrastructure reliability, and partnering with cloud teams for AI/ML infrastructure solutions.

What you'd actually do

  1. Develop and maintain software services that orchestrate the network for Google's next-generation hardware and AI/ML platforms.
  2. Contribute to the design and implementation of scalable network models for machines, racks, and ToR-to-fabric topologies using technologies like UNM and Model X.
  3. Build automation to streamline network capacity provisioning and enhance resource efficiency across Google's datacenters.
  4. Enhance the reliability and operational excellence of the ToR infrastructure, ensuring seamless network performance for all Google services.
  5. Partner with teams across cloud to deliver end-to-end networking solutions for NPIs enabling AI/ML infrastrtcure. Drive or contribute significantly to projects focused on introducing new rack switch hardware, improving the flexibility, efficiency, speed, and reliability of deployments, and boosting network availability.

Skills

Required

  • software development
  • C++

Nice to have

  • C
  • Go
  • Java
  • networking
  • distributed systems