Senior Solutions Architect, AI Compute – Npn

NVIDIA NVIDIA · Semiconductors · CA +5 · Remote

Senior Solutions Architect for AI Compute at NVIDIA, focusing on deploying, managing, and validating AI Compute/HPC infrastructure for enterprise customers and partners. Requires strong Linux system administration, scripting, and cluster management skills, with experience in benchmarking tools and Kubernetes.

What you'd actually do

  1. Primary responsibilities will include deploying, managing, and validating AI Compute/HPC infrastructure in Linux-based environments for new and existing customers.
  2. Be the domain expert with partners during planning calls through implementation.
  3. Handover-related documentation and perform knowledge transfers required to support customers and partners as they begin rolling out some of the most sophisticated systems in the world!
  4. Provide feedback to internal and partners teams such as opening bugs, documenting workarounds, and suggesting improvements.

Skills

Required

  • 8+ years providing in-depth support and deployment services
  • Knowledge and experience with Linux system administration
  • Cluster management and provisioning technologies for bare-metal servers
  • Scripting proficiency (Bash, Python, Ansible, etc.)
  • Experience with schedulers such as SLURM, LSF, UGE, etc.
  • Excellent interpersonal skills
  • Experience with benchmarking tools such as HPL, NCCL tests, MLPerf
  • Kubernetes experience

Nice to have

  • InfiniBand experience
  • Experience with GPU (Graphics Processing Unit) focused hardware/software
  • Experience with MPI (Message Passing Interface)
  • Storage technologies such as Lustre or GPFS
  • Familiarity with OEM GPU platforms
  • Strong channel sales and services knowledge and partner co-selling experience
  • Base Command Manager (BCM)

What the JD emphasized

  • AI Compute/HPC infrastructure
  • Linux system administration
  • Scripting proficiency (Bash, Python, Ansible, etc.)
  • benchmarking tools such as HPL, NCCL tests, MLPerf
  • Kubernetes experience

Other signals

  • Deploying, managing, and validating AI Compute/HPC infrastructure
  • Customer-focused team
  • Analyze, define, and implement large-scale AI Compute projects