Solutions Architect, LLM Model Builder

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

Solutions Architect focused on enabling partners to build, benchmark, fine-tune, optimize, and deploy foundation model solutions for customer workloads, with a strong emphasis on production inference and reasoning/multimodal models.

What you'd actually do

  1. Serve as the lead technical advisor for partners delivering reasoning, multimodal, fine-tuning, and model-serving solutions.
  2. Guide partners to the right approach for customer workloads across fine-tuning, distillation, quantization, compression, benchmarking, and evaluation.
  3. Define benchmark plans, synthetic data and evaluation workflows, and repeatable validation recipes.
  4. Advise on compute planning, including cluster sizing, GPU and network selection, storage, memory tradeoffs, latency and throughput targets, and production-readiness testing.
  5. Guide inference architecture across prefill and decode tradeoffs, batching, routing, disaggregated inference, and serving efficiency.

Skills

Required

  • LLMs
  • VLMs
  • large-scale inference systems
  • fine-tuning
  • benchmarking
  • evaluation
  • optimization
  • production deployment
  • foundation models
  • data preparation
  • post-training
  • reasoning models
  • reinforcement learning
  • synthetic data generation
  • Python
  • PyTorch
  • JAX
  • TensorFlow
  • Nemotron
  • NeMo
  • Dynamo
  • TensorRT-LLM
  • Triton
  • vLLM
  • communication skills
  • presentation skills

Nice to have

  • partner enablement
  • deploy large-scale AI systems in production
  • benchmark suites
  • fine-tuning recipes
  • sizing calculators
  • TCO models for AI workloads
  • GPU infrastructure
  • NVLink
  • InfiniBand
  • MPI
  • NCCL
  • cluster technologies
  • OSS contributions
  • model tooling
  • inference
  • evaluation
  • performance optimization

What the JD emphasized

  • hands-on expertise in fine-tuning, benchmarking, evaluation, optimization, and production deployment
  • Strong understanding of foundation models across data preparation, fine-tuning, post-training, evaluation, and inference.
  • Familiarity with reasoning models, reinforcement learning, and synthetic data generation and evaluation workflows.
  • hands-on experience with PyTorch, JAX, or TensorFlow.
  • Familiarity with Nemotron, NeMo, Dynamo, TensorRT-LLM, Triton, vLLM, and similar inference and optimization stacks.

Other signals

  • foundation models
  • production inference
  • partner enablement
  • fine-tuning
  • evaluation
  • serving