Manager, Solutions Architecture - Data … at NVIDIA

What you'd actually do

Managing and developing a group of infrastructure and HPC specialists ;

Providing guidance and support to partners, helping them successfully deploy and bring up AI Factories;

Helping our partners employ our best practices and reference architectures and taking your knowledge out to the field;

Raising and providing timely advance alerts of critical customer issues that need further focus.

Skills

Required

BS/MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering fields.
8+ overall years work or research experience with Python/ C++ / other software development.
4+ years of experience leading a team.
Track record of medium to large scale AI training and understanding of key libraries used for NLP/LLM/VLA training (NeMo Framework, DeepSpeed etc.)
Experience with integration and deployment of software products in production enterprise environments, and microservices software architecture.
Solid understanding of data center infrastructure: servers, storage, networking, cabling, power, cooling, and physical deployment workflows.
Experience with software microservices and with the incorporation and delivery of software in production environments
Technical leadership and strong understanding of NVIDIA technologies, and success in working with customers.
Excellent verbal, written communication, and technical presentation skills in English.

Nice to have

Understanding of HPC systems: data center design, high speed interconnect InfiniBand, Cluster Storage and Scheduling related design and/or management experience.
Strong coding and debugging skills, and demonstrated expertise in one or more of the following areas: Machine Learning, Deep Learning, Slurm, Docker/Kubernetes, Kubernetes, Singularity, MPI, MLOps, LLMOps, Ansible, Terraform, and other high-performance AI cluster solutions.
Hands-on experience with HPC clusters, InfiniBand, GPU infrastructure, or hyperscale data center technologies.
Experience in AI infrastructure deployment, professional services, or tech vendor post-sales delivery.

What the JD emphasized

Track record of medium to large scale AI training

experience with integration and deployment of software products in production enterprise environments

Solid understanding of data center infrastructure

Experience with software microservices and with the incorporation and delivery of software in production environments

Technical leadership and strong understanding of NVIDIA technologies

AI infrastructure deployment

Do you want to be part of a team that's revolutionizing the field of AI with data center scale solutions? We are looking for a hardworking Solution Architect Manager with experience in designing, building, and maintaining large-scale HPC and AI infrastructure to join our team at NVIDIA.

As Solution Architects, we are actively helping make AI Factories a reality. Our team helps enable some of the industry's largest Solution Providers who serve as our trusted partners. We help partners to understand and adopt our reference architectures and libraries through training and workshops; we help them develop robust NVIDIA practices and support customer conversations; and we help advise them through their most important data center deployments. This is where you come in!

What you'll be doing:

You will be responsible for managing a team of infrastructure experts passionate about delivery and bring up of NVIDIA-powered AI Factories. The ideal candidate will have excellent interpersonal skills to contribute to a dynamic customer focused team. This role will be advising and assisting partners as they define and implement large scale AI/HPC projects. Your primary focus would be on understanding the AI workload and how it interacts with other parts of the system like networking, storage, deep learning frameworks, data cleaning tools, etc. You must be passionate about partner success, and driving AI adoption to the enterprise. Other responsibilities include:

Managing and developing a group of infrastructure and HPC specialists ;
Providing guidance and support to partners, helping them successfully deploy and bring up AI Factories;
Helping our partners employ our best practices and reference architectures and taking your knowledge out to the field;
Raising and providing timely advance alerts of critical customer issues that need further focus.

What we need to see:

BS/MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering fields.
8+ overall years work or research experience with Python/ C++ / other software development.
4+ years of experience leading a team.
Track record of medium to large scale AI training and understanding of key libraries used for NLP/LLM/VLA training (NeMo Framework, DeepSpeed etc.)
Experience with integration and deployment of software products in production enterprise environments, and microservices software architecture.
Solid understanding of data center infrastructure: servers, storage, networking, cabling, power, cooling, and physical deployment workflows.
Experience with software microservices and with the incorporation and delivery of software in production environments
Technical leadership and strong understanding of NVIDIA technologies, and success in working with customers.
Excellent verbal, written communication, and technical presentation skills in English.

Ways to stand out from the crowd:

Understanding of HPC systems: data center design, high speed interconnect InfiniBand, Cluster Storage and Scheduling related design and/or management experience.
Strong coding and debugging skills, and demonstrated expertise in one or more of the following areas: Machine Learning, Deep Learning, Slurm, Docker/Kubernetes, Kubernetes, Singularity, MPI, MLOps, LLMOps, Ansible, Terraform, and other high-performance AI cluster solutions.
Hands-on experience with HPC clusters, InfiniBand, GPU infrastructure, or hyperscale data center technologies.
Experience in AI infrastructure deployment, professional services, or tech vendor post-sales delivery.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 224,000 USD - 356,500 USD.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until April 18, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

#deeplearning