Senior Manager, Silicon Failure Analysis Lab Infrastructure

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

Senior Manager to lead Silicon Failure Analysis (SiFA) Lab Infrastructure, responsible for enabling a high-availability, safe, and scalable failure analysis environment. This role leads the lab framework including facilities, utilities, tool enablement, safety, access control, and operational readiness so that Fault Isolation (FI), Physical Failure Analysis (PFA), and Supplier Quality Engineering (SQE) teams can efficiently root cause our groundbreaking semiconductor products. The role partners closely with FI, PFA, SQE, Corporate Facilities, EHS, IT, Finance, Procurement, and equipment vendors to ensure reliable, secure, and scalable lab operations aligned with NVIDIA’s technology roadmap.

What you'd actually do

  1. Lead the overall Silicon Failure Analysis (SiFA) Lab infrastructure, ensuring a safe, highly available, and scalable environment that enables FI, PFA, and SQE teams to efficiently root‑cause advanced semiconductor issues
  2. Own day‑to‑day lab operations and infrastructure readiness, serving as the primary point of accountability for availability, reliability, and rapid resolution of infrastructure issues impacting failure analysis operations
  3. Manage lab facilities and utilities including power, backup power, cooling water, DI/PCW, exhaust, vacuum, CDA, nitrogen, and specialty gases, coordinating upgrades, maintenance, outages, and construction to minimize disruption
  4. Drive failure analysis tool enablement and reliability from delivery through sustained operation, ensuring preventive maintenance and improving uptime, availability, MTBF, MTTR, and PM compliance
  5. Lead vendor and cross‑functional partnerships with FI, PFA, SQE, Corporate Facilities, EHS, IT, Finance, Procurement, and equipment suppliers to reduce downtime and ensure operational resilience

Skills

Required

  • Bachelor’s degree or higher in Engineering or a related technical field or equivalent experience
  • 12+ overall years of experience in semiconductor, R&D, or high-precision lab infrastructure
  • 5+ years of experience leading a team
  • Demonstrated experience with capital equipment enablement, facilities coordination, and vendor management
  • Strong multi-functional leadership, communication, and execution skills

Nice to have

  • Demonstrated end-to-end ownership of high-availability failure analysis labs to resolve product yield, performance, reliability, and quality issues
  • Proven experience enabling and sustaining complex capital tools with metric-driven reliability improvements
  • Achieved rigorous safety/compliance governance while delivering on a multi‑year scaling roadmap to meet the demands of the latest silicon, packaging, and system challenges

What the JD emphasized

  • high-availability
  • safe
  • scalable
  • highly available
  • rapid resolution
  • minimize disruption
  • reliability
  • operational resilience
  • safety
  • regulatory governance
  • long-term roadmap
  • multi-year planning
  • scaling roadmap