Senior Hpc Architect, Automation and At-scale Deployment

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +12 · Remote

The Senior HPC Architect will support the deployment and bring-up of large-scale GPU compute clusters, enabling AI and GPU computing breakthroughs. This role involves providing engineering solutions for GPU Computing products and software stacks, acting as an internal reference for system administration and large-scale GPU-accelerated systems, and working with scientific researchers and developers to craft workflows and solutions.

What you'd actually do

  1. Provide engineering solutions to operationalize the latest GPU Computing products and software stacks, ensure technical relationships with internal and external engineering teams, and assisting systems, machine learning/deep learning engineers in building creative solutions based on NVIDIA technology.
  2. Be an internal reference for system administration, at-scale system analysis, and other datacenter and large-scale GPU-accelerated system solutions among the NVIDIA technical community.

Skills

Required

  • 8+ years of experience using in accelerated computing for datacenter/HPC-based Enterprise computing solutions.
  • Solid understanding of accelerated computing scheduling and I/O stacks.
  • C/C++/Python/Bash programming/scripting experience.
  • Experience working with engineering or academic research community supporting high performance computing or deep learning.
  • Experience with parallel filesystems.
  • Strong teamwork and communication skills, both verbal and written.
  • Ability to multitask effectively in a dynamic environment.
  • Action driven with strong analytical skills.
  • Desire to be involved in multiple diverse and innovative projects.
  • BS (or equivalent experience) in Engineering, Mathematics, Physics, or Computer Science.

Nice to have

  • Deep Learning framework skills.
  • Exposure to using and deploying telemetry and visualization pipelines
  • Exposure to container technology and Linux performance tools.
  • MS or PhD desirable.

What the JD emphasized

  • operationalize the latest GPU Computing products and software stacks
  • large-scale GPU compute clusters
  • artificial intelligence
  • GPU computing
  • system administration and tuning
  • accelerated computing
  • Deep Learning software and hardware platforms

Other signals

  • large-scale GPU compute clusters
  • artificial intelligence
  • GPU computing
  • system administration and tuning
  • accelerated computing
  • Deep Learning software and hardware platforms