Hardware Reliability Engineer, Global Hardware Reliability Engineering

Google Google · Big Tech · Austin, TX +1

This role focuses on the reliability of hardware infrastructure that supports AI and other Google services. The engineer will analyze system hardware designs, develop reliability plans, conduct tests, and troubleshoot issues to ensure the stability and performance of servers, networking, and storage products, particularly those used for machine learning.

What you'd actually do

  1. Lead analysis of system hardware designs to enable proactive design evaluations and product de-risk at an early stage of development.
  2. Lead system reliability efforts by working with other organizations to define reliability goals and reliability plans, securing the resources needed to execute the plan.
  3. Develop mission profiles for chasis, rack from integration sites to field (data centers) that help predict field reliability.
  4. Implement the reliability plan and lead all efforts to assess and mitigate risk of failure early during New Product Introduction (NPI).
  5. Drive reliability test plans and collect, analyze, and synthesize the test data to enable verification of the design reliability goals.

Skills

Required

  • reliability engineering
  • hardware design analysis
  • system reliability
  • reliability testing
  • failure analysis
  • fault isolation

Nice to have

  • system level reliability tools
  • accelerated life testing
  • reliability modeling
  • reliability statistics
  • mission profile development
  • physics of failure
  • reliability physics
  • statistics
  • JMP