Staff Software Engineer, Fault Management

Google Google · Big Tech · Sunnyvale, CA +1

This role focuses on developing software and firmware solutions to enhance the reliability of servers and their components, including CPUs, memory, and I/O links. The engineer will collaborate with multiple teams to influence the design of Google's data center systems, drive development from requirements to testing, and manage resources to achieve reliability goals. Experience with large-scale infrastructure, distributed systems, and hardware architecture is required.

What you'd actually do

  1. Focus on crafting software/firmware solutions that fortify the reliability of servers and their components, spanning x86 CPU, memory subsystems, peripheral component interconnect express (PCIe)/compute express link (CXL) input/output link covering host and endpoints, and software components.
  2. Partner across multiple teams and job ladders to influence the design and implementation of most compute and storage systems powering Google's data centers.
  3. Drive every facet of development, from requirements definition to design, implementation, unit testing, and integration. Oversee meticulous reviews to guarantee the delivery of high-quality solutions.
  4. Plan and manage resources, and tools to execute against a comprehensive roadmap that advances our reliability goals.
  5. Promote collaborations with vendors and represent the fault management software team in project planning discussions with executive management.

Skills

Required

  • C++
  • software products
  • large-scale infrastructure
  • distributed systems
  • networks
  • compute technologies
  • storage
  • hardware architecture
  • software design
  • architecture
  • SQL
  • SQL Pipelines

Nice to have

  • Reliability
  • Availability
  • Serviceability (RAS) related data pipelines
  • dashboards
  • data structures
  • algorithms
  • technical leadership
  • project teams
  • technical direction
  • matrixed organization
  • cross-functional projects
  • cross-business projects