Staff Software Engineer, Accelerator Platform Software

Google Google · Big Tech · Sunnyvale, CA +1

Staff Software Engineer on the Accelerator Platforms and Laboratory Team at Google Cloud, focusing on foundational hardware and low-level software for accelerator-to-host connectivity. The role involves onboarding co-accelerators, designing and implementing system software (firmware, kernel drivers), developing tests and telemetry, debugging complex system-level challenges, and providing technical leadership. The team is part of the AI and Infrastructure group, supporting AI model development and computing power for Google's services.

What you'd actually do

  1. On-board emerging co-accelerators into Google's ML accelerator families to enable new use cases with improved performance and efficiency.
  2. Collaborate with internal teams to design and implement new features in system software, including firmware or daemons running on baseboard management controller/hosts and kernel drivers.
  3. Design and develop tests, tools, telemetry, and dashboards to generate insights to monitor and debug potential issues.
  4. Analyze, debug, and resolve complex system-level challenges related to kernel, virtualization (input-output memory management unit), and input/output (peripheral component interconnect express/compute express link) stacks.
  5. Provide technical leadership to help formulate and drive software development plans and identify dependencies in cross-functional teams.

Skills

Required

  • software development
  • testing
  • launching software products
  • embedded operating systems
  • software design
  • software architecture
  • C
  • C++
  • linux kernel
  • virtualization
  • computer architecture

Nice to have

  • data structures
  • algorithms
  • technical leadership
  • project teams
  • technical direction
  • complex, matrixed organization
  • cross-functional projects
  • pcie
  • iommu
  • vfio/iommufd
  • dma-buf
  • hardware-accelerated compute pipelines
  • GPUs
  • ML accelerators
  • DSPs
  • system integration
  • boot flow
  • firmware
  • telemetry

What the JD emphasized

  • hardware
  • low-level software
  • kernel
  • firmware
  • virtualization
  • input-output memory management unit
  • peripheral component interconnect express/compute express link