Solutions Architect, Inference Deployments

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

NVIDIA is seeking a Solutions Architect to deploy and enhance AI inference solutions at scale using GPU technology and Kubernetes. The role involves building inference pipelines, orchestrating disaggregated inference, accelerating inference with various backends, and providing technical leadership to customers for enterprise AI deployments.

What you'd actually do

  1. Build inference pipelines with tools like NVIDIA Dynamo, distributing tasks among GPU workers to improve efficiency.
  2. Collaborate with DevOps teams to orchestrate disaggregated inference using Kubernetes for complex workloads.
  3. Accelerate inference pipelines using TensorRT-LLM, vLLM, SGLang, and other backends to ensure seamless integration with disaggregated inference.
  4. Provide mentorship and technical leadership to customers and internal teams, guiding them through the deployment of disaggregated inference systems and resolving complex issues.

Skills

Required

  • Solutions Architecture
  • deploying distributed systems
  • AI inference workloads on Kubernetes
  • NVIDIA Dynamo
  • Triton Inference Server
  • TensorRT-LLM
  • model optimization
  • model serving
  • GPU orchestration
  • NVIDIA GPU Operator
  • NIM Operator
  • Multi-Instance GPU (MIG) partitioning
  • GPU allocation
  • memory hierarchies
  • low-latency networking
  • tuning large language models
  • low-latency inference
  • enterprise environments
  • BS in CS/Engineering or equivalent experience

Nice to have

  • NVIDIA inference technologies (Dynamo, NIM, NIXL, Grove)
  • transformer neural network
  • quantization
  • speculative decoding
  • WideEP
  • NVIDIA Certified AI Engineer
  • Contributions to open-source projects (NVIDIA Dynamo, vLLM, KServe, SGLang)

What the JD emphasized

  • deploying distributed systems and AI inference workloads on Kubernetes
  • low-latency inference

Other signals

  • Deploying generative AI to production
  • Scale AI inference solutions
  • Enterprise AI solutions