Hardware Systems Engineer, Npi AI

Meta Meta · Big Tech · Menlo Park, CA

Hardware Systems Engineer to support the new product introduction (NPI) of next-generation AI and high-performance computing infrastructure for large-scale data center deployments. Role involves validating and scaling cutting-edge AI hardware systems from early bring-up through production readiness, working at the intersection of AI silicon, server systems, and data center operations.

What you'd actually do

  1. Lead end-to-end system validation strategies for AI and HPC hardware platforms, including AI accelerators, GPU clusters, and high-bandwidth memory subsystems in data center environments
  2. Drive hands-on bring-up, characterization, and validation of AI server systems and associated components such as PCIe, NVLink, DRAM, and high-speed networking fabrics
  3. Develop and maintain test specifications, validation procedures, and debug guides tailored to AI infrastructure NPI programs
  4. Investigate and root-cause complex system failures spanning silicon, firmware, software, and hardware layers in collaboration with cross-functional engineering teams
  5. Triage and track hardware and firmware defects through resolution while maintaining forward progress on NPI program milestones

Skills

Required

  • hardware systems engineering
  • silicon validation
  • firmware validation
  • system-level bring-up
  • AI servers
  • GPUs
  • TPUs
  • AI accelerator platforms
  • ASIC bring-up and characterization
  • board-level debug
  • large-scale system validation
  • data center environments
  • test specifications
  • validation procedures
  • debug methodologies
  • root-cause analysis
  • troubleshooting
  • high-speed interconnects
  • memory subsystems
  • PCIe
  • NVLink
  • DDR5
  • HBM
  • debugging tools for SoCs
  • JTAG
  • GDB
  • Trace32
  • bus protocols
  • I2C
  • SPI
  • USB
  • hardware-software interface requirements
  • telemetry
  • diagnostics
  • out-of-band management
  • AI infrastructure deployments
  • lab instrumentation
  • automation frameworks
  • Linux environments
  • server system management tools

What the JD emphasized

  • AI hardware systems
  • NPI
  • system validation
  • silicon validation
  • firmware validation
  • large-scale system validation

Other signals

  • AI hardware systems
  • data center deployments
  • silicon validation
  • system bring-up
  • NPI