Senior Inference Engineer, Aiconfigurator for Dynamo

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1 · Remote

Senior Inference Engineer role focused on optimizing LLM inference deployment configurations using AIConfigurator, integrating GPU systems, model serving, and performance modeling for NVIDIA platforms.

What you'd actually do

  1. Build and evolve AIConfigurator's core optimization engine for LLM serving, including configuration search, SLA-aware ranking, efficiency and latency estimation, and Pareto frontier analysis.
  2. Build production-quality Python/Rust APIs, CLIs, SDK surfaces, and web workflows that help users generate strong deployment configurations for NVIDIA GPU clusters.
  3. Develop configuration generation systems that emit backend-specific artifacts for Dynamo, Kubernetes, TensorRT-LLM, vLLM, and SGLang deployments.
  4. Collaborate with inference runtime, performance, benchmarking, and product groups to ensure simulated results correspond with actual deployment performance on H100, H200, B200, GB200, and upcoming NVIDIA platforms.
  5. Improve model, hardware, and backend support by integrating performance databases, profiling data, support matrices, and validation tools.

Skills

Required

  • Python
  • Rust
  • GPU computing
  • Distributed systems
  • ML infrastructure
  • High-performance model serving
  • LLM inference concepts (batching, latency, efficiency, memory constraints, parallelism strategies, serving SLAs)
  • Data-driven performance analysis
  • Benchmarking
  • Simulation
  • Optimization
  • Resource management

Nice to have

  • TensorRT-LLM
  • vLLM
  • SGLang
  • Triton Inference Server
  • Dynamo
  • Kubernetes
  • NVIDIA GPUs (H100, H200, B200, GB200)
  • Multi-node GPU clusters
  • Disaggregated serving
  • Prefill/decode separation
  • KV cache management
  • NCCL/NIXL/NVSHMEM communication
  • Expert-parallel MoE inference
  • Open-source project experience
  • Technical writing
  • Developer-facing tools ownership
  • Agentic AI solutions

What the JD emphasized

  • 10+ years of relevant software engineering experience
  • Strong Python/Rust engineering skills
  • Experience with GPU computing, distributed systems, ML infrastructure, or high-performance model serving
  • Understanding of LLM inference concepts
  • Experience with data-driven performance analysis, benchmarking, simulation, optimization, or managing resource needs

Other signals

  • LLM inference optimization
  • performance modeling
  • production software engineering
  • deployment configurations