Solutions Architect - Rack Scale AI Systems

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +2

This role focuses on deploying and optimizing NVIDIA's Rack Scale AI Products within datacenter and cloud environments. It involves understanding product roadmaps, designing deployment strategies, collaborating with engineering teams, and ensuring the reliable delivery of AI hardware and software platforms. The role requires strong system engineering, Linux, and scripting skills, with an emphasis on data center architecture and automation.

What you'd actually do

  1. Work with NVIDIA Product Teams to understand new product roadmaps and requirements primarily Rack Scale AI Products.
  2. Finding Optimum Solutions to deploy these products in a Datacenter or a Lab environment using sophisticated design techniques, services and tools.
  3. Assist in roll-out and deployment of new development features aimed at supporting the latest NVIDIA hardware and technologies.
  4. Defining and implementing full scale solutions for product onboarding into our hosted and private cloud environments.
  5. Integrate and Optimize Cluster Deployment methods and manage SW stack deployments, including provisioning these services into the cloud.

Skills

Required

  • Bachelor's or Master's Degree in Computer Science or Software Engineering, or equivalent experience.
  • 12+ years of relevant experience.
  • 6+ years of Linux and Scripting experience.
  • Solid background on OS Kernels and system engineering.
  • A track record of quickly understanding new technologies outside of your domain expertise and deploying systems in sophisticated configurations from hardware through multiple layers of software in a fast-paced environment.
  • Strong technical skills and understanding of embedded systems, orchestration & automation systems, data centers and cloud architecture, as well as excellent communication and planning skills.
  • Strong problem-solving ability and experience in product engineering/failure analysis and debug/ HW or test design.
  • Understanding of dense datacenter design including compute, Storage and networking.

Nice to have

  • Understanding of software engineering principles and enterprise system architecture.
  • Experience with gpu and compute clusters administration & automation.
  • Experience in large scale QA environments, for product bring ups.
  • Special skills in large-scale computing and cluster computing(MPI), data center design include high speed interconnect InfiniBand, Cluster Storage and Scheduling related design and/or management experience.

What the JD emphasized

  • Rack Scale AI Products
  • deploy these products in a Datacenter or a Lab environment
  • support the latest NVIDIA hardware and technologies
  • product onboarding into our hosted and private cloud environments
  • multi-site deployments of NVIDIA products
  • improve time to market next gen products
  • reliable and robust platform from concept to prototype to deployments
  • Cluster Deployment methods and manage SW stack deployments