Manager, Solutions Architecture – GPU and Networking Systems

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +3 · Remote

Manager for Solutions Architecture team focused on GPU and networking systems for AI deployments. Responsibilities include leading a team, providing technical expertise, guiding architecture discussions, collecting customer requirements, and influencing product roadmaps. Requires strong systems-level expertise in server architecture, data center networking, and leadership experience.

What you'd actually do

  1. Recruit & manage a team of solutions architects, system/network and software engineers focused on large-scale GPU and AI networking deployments. Set priorities, allocate resources, mentor, and ensure high-quality customer delivery across multiple concurrent projects - while remaining directly involved in key technical reviews, design decisions, and critical debug efforts.
  2. Provide deep subject-matter expertise in advanced GPU and network systems and serve as the senior technical point of contact for strategic customers. Personally lead and guide complex compute/network configuration and performance debugging, working side-by-side with your team to deliver performant, reliable clusters.
  3. Guide your team as they lead network / compute / software architecture discussions, and support server, network, and cluster bring-up, including on-site data center work where needed.
  4. Systematically collect and synthesize customer-specific requirements across your portfolio. Partner with GPU/Network Systems Engineering, Product Management, and Sales to influence roadmap priorities and packaging of reference designs and solutions.
  5. Demonstrate SME in advanced GPU & network systems and be a trusted technical advisor to NVIDIA's strategic customers. Bring customer-specific requirements to product teams to guide product roadmap features.

Skills

Required

  • BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or other Engineering fields or equivalent experience
  • 8+ overall years in Systems/Solutions/Field Engineering, Network or Data Center Engineering, or similar roles
  • 2+ years leading or mentoring engineers or architects (formal manager or strong tech lead)
  • System-level expertise across CPU/GPU server architecture, NICs, Linux, system software, and kernel drivers
  • Experience with data center networking (Ethernet and/or InfiniBand switches, fabrics, and associated tooling)
  • Familiarity with data center infrastructure (power, cooling, deployment constraints)
  • Proven ability to lead technical teams, set priorities, and drive complex projects from design through production
  • Demonstrated success working with Product Management, Sales, and Engineering
  • Strong time management skills and ability to balance planning with hands-on support where needed
  • Excellent written and verbal communication, including the ability to lead customer meetings, communicate status and risks, and produce clear design docs, debug summaries, and presentations

Nice to have

  • Direct people management & recruiting experience for geographically distributed technical teams
  • Track record leading bring-up and deployment of large clusters or supercomputing environments
  • Background in external customer-facing roles (field engineering, escalations, or pre/post-sales architecture)
  • Systems engineering, coding, and debugging skills including experience with C/C++, Linux kernel and drivers
  • Hands-on experience with NVIDIA GPU systems and SDKs (e.g., CUDA), NVIDIA networking technologies (NICs, RoCE, InfiniBand), and/or ARM-based CPU solutions as well as familiarity with virtualization and cloud-native networking concepts

What the JD emphasized

  • AI hardware and software technologies into production
  • large-scale GPU and AI networking deployments
  • customer data centers
  • end-to-end solutions deployments
  • deep-dive debugging
  • customer feedback
  • advanced GPU and network systems
  • complex compute/network configuration and performance debugging
  • server, network, and cluster bring-up
  • data center infrastructure
  • NVIDIA GPU systems and SDKs
  • NVIDIA networking technologies