Software Engineer, Tpu Compiler Development Infrastructure

Google Google · Big Tech · Sunnyvale, CA +1

Software Engineer focused on improving the infrastructure and developer productivity for the XLA TPU compiler team. The role involves reducing build/test times, modernizing build systems, and designing scalable architectures to support new hardware and compiler features. While not requiring deep ML expertise, the role supports ML hardware development.

What you'd actually do

  1. Reduce CL time to submit for a CL and minimize CL rollback for the whole XLA TPU team. Drive infrastructure improvements that remove friction from the daily development of the XLA TPU Compiler team.
  2. Develop tools supporting compiler engineers as they work through stages of new TPU introduction (e.g., testing when hardware is not yet available or very limited).
  3. Modernize and simplify build/test fixtures (e.g. xla_test) to make them more reliable and easier for the team to use.
  4. Design and implement system architectures which cleanly handle ever increasing number of TPU generations and compiler features, ensuring the codebase doesn't become a "spaghetti" of special cases.
  5. Identify and resolve accelerator utilization bottlenecks, improve accelerator test coverage without slowing down CL submission.

Skills

Required

  • coding in C++
  • coding in Python
  • Google Infrastructure such as Blaze, TAP, or Guitar

Nice to have

  • compilers
  • hardware
  • deep ML expertise
  • infrastructure surrounding low-level ML hardware programming

What the JD emphasized

  • minimize Changelist rollback
  • reduce CL time to submit
  • minimize CL rollback