Senior Solutions Architect, AI Infrastructure

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +2 · Remote

This role focuses on deploying and integrating NVIDIA's AI hardware and software solutions in customer data centers, acting as a Solutions Architect & Engineer. It involves guiding customers on system deployments, providing technical advice, identifying new project opportunities, and debugging performance issues related to GPU and network systems.

What you'd actually do

  1. Working with NVIDIA AI Native and Consumer Internet customers on large data center GPU server and networking system deployments as Solution Architect Engineer.
  2. Demonstrate subject matter expertise in advanced GPU & network systems and be a trusted technical advisor to NVIDIA's strategic customers.
  3. Identify new project opportunities for NVIDIA products and technology solutions in data center and artificial intelligence applications.
  4. Work as customer trusted advisor conducting regular technical customer meetings for product roadmap, cluster issues debug, feature discussions and introduction to new technology solutions
  5. Analyze and debug compute/network configuration, performance issues to deliver performant clusters

Skills

Required

  • BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or other Engineering fields or equivalent experience.
  • 8+ years of Systems/Solution Engineering experience
  • System level expertise of CPU/GPU server architecture, NICs, Linux, system software and kernel drivers
  • Experience with networking switches for Ethernet/Infiniband, and Data Center infrastructure (power/cooling)
  • Knowledge of DevOps/MLOps technologies such as Docker/containers, Kubernetes
  • Effective time management
  • Strong verbal/written communication skills

Nice to have

  • External customer facing background
  • Experience with bringup and deployment of large clusters
  • Systems engineering, coding, and debugging skills including experience with C/C++, Linux kernel and drivers
  • Hands-on experience with NVIDIA GPU systems/SDKs (e.g. CUDA), NVIDIA Networking technologies (e.g. NICs, RoCE, InfiniBand), and/or ARM CPU solutions
  • Familiarity with virtualization technology concepts

What the JD emphasized

  • GPU server and networking system deployments
  • GPU & network systems
  • data center
  • AI Native and Consumer Internet customers
  • GPU/Network Systems Engineering
  • product roadmap
  • compute/network configuration
  • performance issues
  • large clusters
  • NVIDIA GPU systems/SDKs
  • NVIDIA Networking technologies

Other signals

  • customer data centers
  • GPU server and networking system deployments
  • product roadmap features
  • AI Native and Consumer Internet customers