Solutions Architect, Infrastructure at NVIDIA

What you'd actually do

Lead end‑to‑end execution for Hyperscaler customers to rapidly bring NVIDIA Data Center GPU and networking platforms to market at scale.

Drive strategic partnership and alignment with Product teams to understand roadmap intent, co‑define critical metrics, and ensure unified direction across technical, sales, and leadership organizations.

Influence without authority across Product, Engineering, Sales, Operations, and CSP customers, driving clarity, alignment, and unblock paths for scale‑up.

Analyze deployment and performance data, identifying product health trends, system bottlenecks, and operational risks.

Solve challenging technical problems involving GPUs, networking, drivers, containers, firmware, and distributed system interactions.

Skills

Required

Solutions Architecture
Infrastructure Engineering
bring-up and validation of large-scale NVIDIA GPU platforms
multi-GPU and multi-node architectures
high-performance networking technologies (e.g., RDMA, congestion control, high-bandwidth interconnects)
Linux systems tools
server hardware architecture
BMC/IPMI/Redfish
Linux fundamentals across drivers, kernel subsystems, cgroups, containers, and node‑level performance analysis

Nice to have

multi-functional leadership
early platform readiness
cloud engineering teams
product strategy
large-scale customer deployments
NVIDIA technologies
worldwide cloud hosting providers
large enterprise environments
Product
Engineering
Sales
Operations
CSP customers
executive-level communication
future improvements in platform design, validation, and operational workflows
CUDA
NCCL
NVSwitch/NVLink
driver behavior
performance tuning
dmesg
journalctl
lspci
numactl
ethtool
iostat
perf
nvidia-smi
top/htop
ipmitool
container‑level tooling
PCIe topologies
system firmware
NUMA
BIOS/UEFI configuration
power/thermal envelopes
memory/subsystem behavior
remote management
hardware health monitoring
out‑of‑band debugging
cgroups
containers
node‑level performance analysis
cluster
node
accelerator
network
application layer
Compute and networking infrastructure
Instance types
networking primitives
high‑performance communication paths
Hyperscalers
Cloud Service Providers
multi-team infrastructure challenges
customer groups
GPU or infrastructure products
pilot to high‑volume deployment
large data center environments
modern deep learning
LLM architectures
distributed training/inference challenges at scale

What the JD emphasized

bring-up and validation of large-scale NVIDIA GPU platforms

high-performance networking technologies

NVIDIA system software stacks

Linux systems tools

server hardware architecture

BMC/IPMI/Redfish

Strong Linux fundamentals

identify performance bottlenecks

taking GPU or infrastructure products from pilot to high‑volume deployment

Do you thrive on taking a strategic product from launch to go‑to‑market at scale across the world’s largest customers? NVIDIA is looking for an Infrastructure Solutions Architect to lead deployment and bring‑up of our next‑generation Data Center GPUs and networking platforms.

As part of the NVIDIA Solutions Architecture team, we navigate uncharted technical and organizational spaces — serving as the bridge between early platform readiness, cloud engineering teams, product strategy, and large‑scale customer deployments. We are looking for Solution Architects to combine hands‑on infrastructure expertise with multi-functional leadership to accelerate adoption of NVIDIA technologies across worldwide cloud hosting providers and large enterprise environments.

What You’ll Be Doing:

Lead end‑to‑end execution for Hyperscaler customers to rapidly bring NVIDIA Data Center GPU and networking platforms to market at scale.
Drive strategic partnership and alignment with Product teams to understand roadmap intent, co‑define critical metrics, and ensure unified direction across technical, sales, and leadership organizations.
Influence without authority across Product, Engineering, Sales, Operations, and CSP customers, driving clarity, alignment, and unblock paths for scale‑up.
Analyze deployment and performance data, identifying product health trends, system bottlenecks, and operational risks.
Solve challenging technical problems involving GPUs, networking, drivers, containers, firmware, and distributed system interactions.
Deliver streamlined executive‑level communication on status, risks, progress, and required decisions.
Collaborate with Product and Engineering, enabling future improvements in platform design, validation, and operational workflows.

What We Need to See:

BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or similar, or equivalent experience.
4+ years experience in Solutions Architecture, Infrastructure Engineering, or similar technical roles.
Hands‑on experience with bring‑up and validation of large‑scale NVIDIA GPU platforms, including multi‑GPU and multi‑node architectures.
Understanding of high‑performance networking technologies (e.g., RDMA, congestion control, high‑bandwidth interconnects) and their role in distributed AI workloads.
Familiarity with NVIDIA system software stacks: CUDA, NCCL, NVSwitch/NVLink, driver behavior, and performance tuning.
Proficiency with Linux systems tools for identifying issues and evaluating system performance, such as: dmesg, journalctl, lspci, numactl, ethtool, iostat, perf, nvidia-smi, top/htop, ipmitool, container‑level tooling, and related utilities.
Understanding of server hardware architecture, including PCIe topologies, system firmware, NUMA, BIOS/UEFI configuration, power/thermal envelopes, and memory/subsystem behavior.
Understanding of BMC/IPMI/Redfish for remote management, hardware health monitoring, and out‑of‑band debugging during early‑stage bring‑up.
Strong Linux fundamentals across drivers, kernel subsystems, cgroups, containers, and node‑level performance analysis.
Ability to identify performance bottlenecks at the cluster, node, accelerator, network, or application layer.

Ways to Stand Out from the Crowd:

Outstanding interpersonal skills and the ability to build clarity and direction across diverse, fast paced technical teams.
Knowledge of Compute and networking infrastructure (e.g., Instance types, networking primitives, high‑performance communication paths etc) at Hyperscalers or Cloud Service Providers.
Demonstrated leadership resolving multi‑team infrastructure challenges across engineering, product, and customer groups.
A consistent record of taking GPU or infrastructure products from pilot to high‑volume deployment in large data center environments.
Familiarity with modern deep learning, LLM architectures, and distributed training/inference challenges at scale.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until February 9, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.