Software Engineer, Tpu Host Networking

Google Google · Big Tech · Sunnyvale, CA +1

Software Engineer role focused on the TPU Host Networking stack, enabling large-scale training and low-latency inference for ML workloads. Responsibilities include designing, developing, testing, and debugging the networking stack from hardware to ML frameworks, and performing full-stack cross-layer optimization.

What you'd actually do

  1. Write product or system development code.
  2. Design, develop, test and deploy TPU networking stack.
  3. Perform full-stack cross-layer optimization of TPU networking performance for a variety of ML workloads.
  4. Analyze and debug TPU networking performance issues in production.
  5. Develop and enhance telemetry to provide deep visibility into network behavior and accelerate troubleshooting.

Skills

Required

  • software development
  • developing large-scale infrastructure
  • distributed systems
  • networks
  • compute technologies
  • storage
  • hardware architecture
  • networking protocols
  • troubleshooting
  • C++

Nice to have

  • data structures
  • algorithms
  • performance optimization
  • networking protocols
  • network infrastructure
  • machine learning infrastructure