Senior Solutions Architect, Networking

NVIDIA NVIDIA · Semiconductors · CA · Remote

This role focuses on deploying and managing large-scale AI Data Centers, specifically the networking infrastructure that supports AI/ML training and inference. The Solutions Architect will work with customers and internal teams to implement and troubleshoot networking solutions, educate on future-proofing, and drive proof-of-concepts.

What you'd actually do

  1. Deploy, lead, and maintain large-scale AI Data Centers - control, network, and storage stack.
  2. Tackling network issues is a big task of this role. Identifying hardware issues and supervising them through bugs while keeping customers updated on the current progress.
  3. Build high-performance DC fabrics using InfiniBand and high-throughput Ethernet (RoCE and traditional IP). These fabrics support general compute workloads and GPU-dense AI/ML training and inference environments.
  4. Implement networking solutions, such as Spectrum switch, ConnectX network adapter, and Bluefield DPU.
  5. Educate customers on future-proofing the infrastructure, help drive proof-of-concept (POCs).

Skills

Required

  • BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or other Engineering fields, or equivalent experience.
  • 8+ years of experience in designing, managing, and supporting large-scale hybrid networks.
  • Expert in networking technologies: TCP/UDP, IPv4/IPv6, BGP/MP-BGP, VPN, L2 switching, EVPN, VxLAN, Segment Routing, MPLS, IS-IS, DWDM.
  • Experience automating SDN/NFV/NFVI Infrastructure
  • System-level understanding of server/rack-level architecture, BMC, PCIe devices, Network Adapters, Linux OS, and kernel drivers.
  • Superb communication and liaison skills to work with customers, partners, and internal functions.

Nice to have

  • Advanced-level experience with Cisco / Arista / Juniper is a huge plus.
  • Cisco CCIE (routing and switching) + Fabric manager understanding.
  • Hands-on experience in the Linux Environment and software-defined networking.
  • Working knowledge of Infiniband and storage HBA.
  • Scripting is helpful.

What the JD emphasized

  • large-scale AI Data Centers
  • networking projects
  • large-scale hybrid networks
  • networking technologies
  • AI/ML training and inference environments