Senior Software Engineer, Tpu, AI Infrastructure

Google Google · Big Tech · Taipei, Taiwan

Senior Software Engineer role focused on developing firmware and software for Google's custom AI accelerators (TPUs). The role involves designing and building low-level C++ code for embedded micro-controllers on ASICs, co-designing hardware/software interfaces, developing tools for ASIC bring-up and debugging, building simulators, and architecting telemetry systems for monitoring TPUs. This position is crucial for enabling the development and scaling of AI models and infrastructure at Google.

What you'd actually do

  1. Design and build firmware running on embedded micro-controllers with limited memory footprints on the accelerator Application-Specific Integrated Circuits (ASIC).
  2. Co-design hardware/software interface, and work with the hardware design and development teams.
  3. Design and develop tools to update and debug ASIC firmware, and enable chip bring-up and hardware debugging.
  4. Build functional or cycle-level simulators that bit-accurately model the custom accelerator ASICs, build tools and infrastructure to help ASIC design verification, tapeout, and bring-up, and develop embedded CPU simulators as part of the full system simulator.
  5. Architect and design debuggability mechanisms and telemetry collection systems to monitor Tensor Processing Units (TPUs), enhancing customer satisfaction and enabling rapid response, diagnosis, and mitigation of production failures.

Skills

Required

  • software development in C++
  • embedded operating systems
  • software design and architecture
  • testing, maintaining, or launching software products

Nice to have

  • Embedded software development in C/C++
  • machine learning (ML)
  • security
  • confidential computing
  • high bandwidth memory (HBM)
  • peripheral component interconnect express (PCIe)
  • advanced RISC machines (ARM)
  • hardware/software co-design at the chip-level
  • architecting scalable software
  • multi-threaded designs

What the JD emphasized

  • custom accelerators (ASICs)
  • TPUs
  • firmware
  • hardware/software interface
  • chip bring-up
  • debug
  • simulators
  • telemetry collection systems

Other signals

  • Developing C++ code for custom accelerators (ASICs) like TPUs
  • Defining APIs for software stack interacting with ASICs
  • Managing hardware-centric features like interrupts and DMAs
  • Debugging and bringing up new ASICs
  • Working with simulation models for pre-silicon software development
  • Collaborating with chip design, system software, ML supercomputer, compiler, and system test teams
  • Empowering AI model development and delivering computing power