Senior Solutions Architect, GPU System

NVIDIA NVIDIA · Semiconductors · Beijing, China +1

NVIDIA is seeking a Senior Solutions Architect with expertise in GPU server platforms and AI infrastructure to help customers design, deploy, and optimize NVIDIA-based AI factories. The role involves leading presales and architecture engagements, designing end-to-end AI data center solutions, and supporting the deployment of NVIDIA platforms for LLM training and inference workloads.

What you'd actually do

  1. Lead presales and architecture engagements with AI industry customers, focusing on GPU servers, AI clusters, and large‑scale training/inference platforms built on NVIDIA HGX, GPU systems, and reference architectures.
  2. Design and validate end‑to‑end AI data center solutions, including server platforms, storage connectivity, and high‑performance networking based on Spectrum, Quantum, ConnectX, and BlueField.
  3. Define system architectures for AI supercomputing, LLM training, and inference workloads, including node configuration, GPU topology, PCIe/NVLink considerations, and network design.
  4. Support business teams in exploring, developing, and deploying NVIDIA server and GPU solution opportunities, from early technical discovery through POC and production rollout.
  5. Own and execute POCs and hands‑on labs that validate GPU server performance, scalability, reliability, and interoperability across compute, storage, and network domains.

Skills

Required

  • BS/BA in Computer Science, Electrical/Computer Engineering, or equivalent experience
  • 6+ years of experience with data center servers, GPU platforms, or large‑scale AI/HPC infrastructure
  • Strong understanding of GPU server architecture: CPU/GPU balance, memory and PCIe/NVLink topology, storage and NIC placement, and power/cooling considerations.
  • Proven experience designing or operating AI or HPC clusters using GPU‑accelerated servers in cloud or on‑prem data centers.
  • Solid background in data center and cloud networking for AI workloads, including leaf‑spine fabrics, RDMA and high‑bandwidth/low‑latency designs.
  • Strong Linux system and Linux networking skills, including driver, firmware, and OS‑level tuning for GPU and NIC performance.
  • Knowledge and experience with K8S, RDMA/RoCE and, ideally, RoCE and Infiniband AI clusters.
  • Excellent communication skills

Nice to have

  • Knowledge with AI workflows
  • Cloud infrastructure understanding and development experience
  • Strong understanding of Kubernetes architecture (API server, etcd, controller manager, scheduler, kubelet, CNI)
  • Coding experience with CUDA, RDMA applications
  • Hands-on experience with NVIDIA networking solution stack

What the JD emphasized

  • 6+ years of experience with data center servers, GPU platforms, or large‑scale AI/HPC infrastructure
  • Strong understanding of GPU server architecture
  • Proven experience designing or operating AI or HPC clusters using GPU‑accelerated servers
  • Solid background in data center and cloud networking for AI workloads
  • Strong Linux system and Linux networking skills

Other signals

  • design and deploy AI factories
  • design and validate end-to-end AI data center solutions
  • define system architectures for AI supercomputing, LLM training, and inference workloads