Senior Software Architect - Data Center Systems

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +5 · Remote

NVIDIA is seeking a Senior Software Architect to lead software activities for their deep learning server platforms, focusing on system architecture, design, and collaboration with various engineering teams and customers. The role requires deep expertise in server system design, particularly at the SW/HW interface, and an understanding of Deep Learning workloads.

What you'd actually do

  1. You will lead software activities for NVIDIA's deep learning server platforms, from design through production; collaborating with teams across company to deliver software solutions
  2. Drive the system architecture for a complex server platform in a multi-functional environment.
  3. Partner across application software, libraries, system software and firmware teams to design complete software solutions for new server platforms
  4. Work directly with major customers to understand their requirements and work to align their roadmap with NVIDIA’s roadmap.
  5. Work with business partners and vendors to shape their products to meet NVIDIA’s needs.

Skills

Required

  • System architecture and design
  • Server systems design
  • SW/HW interface design
  • Deep Learning workloads
  • HPC workloads
  • Accelerated computing platforms
  • Out of Band management architectures
  • In-band management architectures
  • Server system architecture
  • Left shift strategy implementation
  • BS or MS degree in Computer Engineering, Computer Science, or related degree or equivalent experience
  • 10+ years of experience

Nice to have

  • Cloud and cluster level deployment and management systems
  • Device management protocols (Redfish, IPMI, MCTP, PLDM, RDE)
  • Storage technologies
  • Networking technologies

What the JD emphasized

  • Deep experience in designing architecture for scalable and performant server systems, particularly at the SW/HW interface.
  • Understanding of HPC or Deep learning workloads and use of accelerated computing platforms.
  • Expertise in Out of Band and In-band management architectures.
  • Knowledge of server system architecture and implications of architecture decisions on overall performance of end applications.
  • Demonstrable experience in implementing left shift strategy to de-risk program execution.
  • 10+ years in the area of System architecture and design.