Senior Solutions Architect, GPU Performance and LLM - Cloud Service Providers

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1

Senior Solutions Architect at NVIDIA focused on helping large customers build and optimize AI/ML and HPC software solutions, particularly involving LLM training and inference on NVIDIA's hardware and software stack. The role involves deep technical engagement with customers, performance analysis, and solution development.

What you'd actually do

  1. Working with tech giants to develop and demonstrate solutions based on NVIDIA’s groundbreaking software and hardware technologies.
  2. Partnering with Sales Account Managers and Developer Relations Managers to identify and secure business opportunities for NVIDIA products and solutions.
  3. Serving as the main technical point of contact for customers engaged in the development of intricate AI infrastructure, while also offering support in understanding performance aspects related to tasks like large scale LLM training and inference.
  4. Conducting regular technical customer meetings for project/product details, feature discussions, introductions to new technologies, performance advice, and debugging sessions.
  5. Collaborating with customers to build Proof of Concepts (PoCs) for solutions to address critical business needs and support cloud service integration for NVIDIA technology on hyperscalers.

Skills

Required

  • 8+ years of engineering (performance/system/solution) experience
  • Hands-on experience building performance benchmarks for data center systems, including large scale AI training and inference.
  • Understanding of systems architecture including AI accelerators and networking as it relates to the performance of an overall application.
  • Effective engineering program management with the capability of balancing multiple tasks.
  • Ability to communicate ideas clearly through documents, presentations, and in external customer-facing environments.

Nice to have

  • Hands-on experience with Deep Learning frameworks (PyTorch, JAX, etc.), compilers (Triton, XLA, etc.), and NVIDIA libraries (TRTLLM, TensorRT, Nemo, NCCL, RAPIDS, etc.).
  • Familiarity with deep learning architectures and the latest LLM developments.
  • Background with NVIDIA hardware and software, performance tuning, and error diagnostics.
  • Hands-on experience with GPU systems in general including but not limited to performance testing, performance tuning, and benchmarking.
  • Experience deploying solutions in cloud environments including AWS, GCP, Azure, or OCI as well as knowledge of DevOps/MLOps technologies such as Docker/containers, Kubernetes, data center deployments, etc. Command line proficiency.

What the JD emphasized

  • large scale LLM training and inference
  • performance aspects
  • performance issues
  • performance benchmarks
  • performance advice
  • performance tuning
  • performance testing

Other signals

  • customer-facing technical solutions
  • AI/ML and HPC software solutions at scale
  • large scale LLM training and inference
  • AI accelerators and networking