Soc Architect, Xprof

Google Google · Big Tech · Sunnyvale, CA +1

This role focuses on optimizing the performance of machine learning software and hardware stacks, particularly for TPUs, by providing performance debugging and analysis for ML workloads and custom kernels. The engineer will contribute to the end-to-end stack and analysis tools to support new ML paradigms and partner with teams to deliver chip profiling requirements.

What you'd actually do

  1. Learn and build an intuitive understanding of existing data collection, analysis, and visualization workflows.
  2. Support new and exciting ML paradigms (such as horizontal scaling for upcoming TPU chips) by making contributions across the end-to-end stack and analysis tools.
  3. Partner with Product Area leads to understand model optimization use cases, drive cross functional efforts to deliver on chip profiling requirements, and propose new hardware features.
  4. Collaborate across Hardware, Driver, Runtime, and Performance Analysis teams and many other stakeholders.

Skills

Required

  • Bachelor's degree or equivalent practical experience
  • 2 years of experience with Single-Level Cell (SLC), SOC performance, SOC architecture
  • 2 years of coding experience in one or more of the following languages: C, C++, Java, or Python
  • 2 years of experience in the machine learning field
  • Experience in experimental design, analysis, and performance tools
  • Experience in performance debugging of single-node systems

Nice to have

  • Experience with ML frameworks such as TensorFlow, JAX, and PyTorch, or ML compilers such Accelerated Linear Algebra (XLA)
  • Experience in releasing and supporting open-source projects
  • Proven track record in open-source software development

What the JD emphasized

  • SOC performance
  • SOC architecture
  • machine learning field
  • performance debugging of single-node systems

Other signals

  • ML software/hardware stacks
  • performance debugging
  • ML graph summaries
  • TPU chips
  • model optimization