Senior Hpc Solutions Architect at NVIDIA

What you'd actually do

Assisting with deployment, debugging, and improving the efficiency of AI workloads on extensive NVIDIA platforms.

Identifying hardware issues, supervising them through bugs, and keeping customers updated on current progress.

Benchmarking new framework features, analyzing performance, and sharing actionable insights with both customers and internal teams.

Working directly with external customers/partners to solve cluster performance and stability issues, identify bottlenecks, and implement effective solutions.

Build expertise and guide customers in scaling workloads efficiently and reliably on the latest generation of NVIDIA GPUs.

Skills

Required

BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or other Engineering fields, or equivalent experience.
10+ years of experience in designing, managing, and supporting large-scale hybrid networks.
Strong programming skills in at least one of the following languages: C, C++, or Python.
Practical experience identifying and resolving bottlenecks in large-scale training workloads or parallel applications.
Proven understanding of CPU and GPU architectures, CUDA, parallel filesystems, and high-speed interconnects.
Experienced in working with large compute clusters with an understanding of their internal scheduling and resource management mechanisms (e.g. SLURM or Cloud based clusters).
System-level understanding of server/rack-level architecture, BMC, PCIe devices, Network Adapters, Linux OS, and kernel drivers.
Excellent communication and liaison skills to work with customers, partners, and internal functions.

Nice to have

Systems engineering, coding, and debugging skills, including experience C/C++, Linux kernel, and drivers
Hands-on experience with NVIDIA systems/SDKs (e.g. CUDA), NVIDIA Networking technologies (e.g., DPU, RoCE, InfiniBand), and/or ARM CPU solutions
Hands-on experience in the Linux Environment and software-defined networking.
Experience with system board architectures and familiarity with x56, 64-bit, and low-level hardware programming.

NVIDIA has been redefining computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s motivated by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brain of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As a NVIDIAN, you’ll be immersed in a diverse, inviting environment that encourages everyone to do their best work. Step into the team and explore how you can make a lasting impact on the world.

We are looking for a networking professional to join the NVIDIA Solution Architects team. The team supports NVIDIA’s AI factory deployments at various customer sites. Together, we will drive end-to-end integration of technology solutions with some of NVIDIA's most strategic technology customers. You will offer recommendations to customers and partners on our product upgrades. This dynamic role requires excellent social skills to analyze, define, implement, and fix large-scale networking projects with customers and internal teams.

What you'll be doing:

Assisting with deployment, debugging, and improving the efficiency of AI workloads on extensive NVIDIA platforms.
Identifying hardware issues, supervising them through bugs, and keeping customers updated on current progress.
Benchmarking new framework features, analyzing performance, and sharing actionable insights with both customers and internal teams.
Working directly with external customers/partners to solve cluster performance and stability issues, identify bottlenecks, and implement effective solutions.
Build expertise and guide customers in scaling workloads efficiently and reliably on the latest generation of NVIDIA GPUs.
Collaborate with AI factory deployment teams and ensure RAs/Blueprints are accurately followed and implemented.

What we need to see:

BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or other Engineering fields, or equivalent experience.
10+ years of experience in designing, managing, and supporting large-scale hybrid networks. Experience with scripting is helpful.
Strong programming skills in at least one of the following languages: C, C++, or Python.
Practical experience identifying and resolving bottlenecks in large-scale training workloads or parallel applications.
Proven understanding of CPU and GPU architectures, CUDA, parallel filesystems, and high-speed interconnects.
Experienced in working with large compute clusters with an understanding of their internal scheduling and resource management mechanisms (e.g. SLURM or Cloud based clusters).
System-level understanding of server/rack-level architecture, BMC, PCIe devices, Network Adapters, Linux OS, and kernel drivers.
Excellent communication and liaison skills to work with customers, partners, and internal functions.

Ways to stand out from the crowd:

Systems engineering, coding, and debugging skills, including experience C/C++, Linux kernel, and drivers
Hands-on experience with NVIDIA systems/SDKs (e.g. CUDA), NVIDIA Networking technologies (e.g., DPU, RoCE, InfiniBand), and/or ARM CPU solutions
Hands-on experience in the Linux Environment and software-defined networking.
Experience with system board architectures and familiarity with x56, 64-bit, and low-level hardware programming.

Widely considered to be one of the technology world’s most desirable employers, NVIDIA offers highly competitive salaries and a comprehensive benefits package. As you plan your future, see what we can offer to you and your family at www.nvidiabenefits.com/

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until April 14, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.