Meta is seeking a Hardware Systems Engineer to support the new product introduction (NPI) of next-generation AI and high-performance computing infrastructure for large-scale data center deployments. In this role, you will work at the intersection of AI silicon, server systems, and data center operations, partnering with hardware design, firmware, software, networking, and capacity engineering teams to validate and scale cutting-edge AI hardware systems from early bring-up through production readiness.
Responsibilities
Lead end-to-end system validation strategies for AI and HPC hardware platforms, including AI accelerators, GPU clusters, and high-bandwidth memory subsystems in data center environments Drive hands-on bring-up, characterization, and validation of AI server systems and associated components such as PCIe, NVLink, DRAM, and high-speed networking fabrics Develop and maintain test specifications, validation procedures, and debug guides tailored to AI infrastructure NPI programs Investigate and root-cause complex system failures spanning silicon, firmware, software, and hardware layers in collaboration with cross-functional engineering teams Triage and track hardware and firmware defects through resolution while maintaining forward progress on NPI program milestones Identify gaps in test coverage and drive improvements to test methodologies, tooling, and automation frameworks across the NPI lifecycle Partner with AI platform and capacity engineering teams to define acceptance criteria and deployment readiness standards for new AI hardware systems Guide data collection, analysis, and reporting efforts to surface systemic hardware quality trends and inform go/no-go decisions for production deployment Communicate validation status, risk assessments, and technical findings to internal engineering teams and external hardware vendors Collaborate with firmware and software teams to define hardware-software interface requirements for telemetry, diagnostics, and remote management of AI infrastructure
Qualifications
Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience 6+ years of experience in hardware systems engineering, silicon validation, firmware validation, or system-level bring-up for AI servers, GPUs, TPUs, or AI accelerator platforms Experience in one or more of the following domains: ASIC bring-up and characterization, board-level debug, firmware validation, or large-scale system validation in data center environments Experience developing test specifications, validation procedures, and debug methodologies for complex hardware systems Experience leading root-cause analysis and troubleshooting of system-level failures across hardware, firmware, and software stacks Experience with high-speed interconnects or memory subsystems such as PCIe, NVLink, DDR5, or HBM in the context of AI or HPC system validation 3+ years of experience with debugging tools for SoCs including JTAG, GDB, or Trace32, and familiarity with common bus protocols such as I2C, SPI, USB, and PCIe 3+ years of experience defining hardware-software interface requirements for telemetry, diagnostics, and out-of-band management in AI infrastructure deployments Experience integrating lab instrumentation and automation frameworks to support large-scale NPI validation workflows Proficiency in Linux environments and server system management tools used in data center operations