Staff Software Engineer, Fault Management

Google Google · Big Tech · Sunnyvale, CA +1

Staff Software Engineer focused on improving the reliability of servers and their components within Google's data centers. The role involves developing software/firmware solutions, partnering with cross-functional teams, and driving development from requirements to testing and integration. Experience with large-scale infrastructure, distributed systems, and hardware architecture is required.

What you'd actually do

  1. Focus on crafting software/firmware solutions that fortify the reliability of servers and their components, spanning x86 CPU, memory subsystems, peripheral component interconnect express (PCIe)/compute express link (CXL) input/output link covering host and endpoints, and software components.
  2. Partner across multiple teams and job ladders to influence the design and implementation of most compute and storage systems powering Google's data centers.
  3. Drive every facet of development, from requirements definition to design, implementation, unit testing, and integration. Oversee meticulous reviews to guarantee the delivery of high-quality solutions.
  4. Plan and manage resources, and tools to execute against a comprehensive roadmap that advances our reliability goals.
  5. Promote collaborations with vendors and represent the fault management software team in project planning discussions with executive management.

Skills

Required

  • C++
  • SQL
  • SQL Pipelines
  • software design
  • software architecture
  • large-scale infrastructure
  • distributed systems
  • networks
  • compute technologies
  • storage
  • hardware architecture
  • testing
  • launching software products

Nice to have

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field
  • data structures
  • algorithms
  • technical leadership
  • project teams
  • technical direction
  • matrixed organization
  • cross-functional projects
  • cross-business projects
  • Reliability, Availability, and Serviceability (RAS) related data pipelines
  • dashboards

What the JD emphasized

  • 8 years of experience programming in C++
  • 5 years of experience testing, and launching software products
  • 5 years of experience building and developing large-scale infrastructure, distributed systems or networks, or experience with compute technologies, storage, or hardware architecture
  • 3 years of experience with software design and architecture