Senior Software Engineer - Nim Factory Container and Cloud Infrastructure

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1 · Remote

Senior Software Engineer role focused on container and cloud infrastructure for NVIDIA Inference Microservices (NIMs) and hosted services. The role involves designing and implementing container strategies, building enterprise-grade software for container build, packaging, and deployment, and improving reliability, performance, and scale across thousands of GPUs, with a focus on disaggregated LLM inference.

What you'd actually do

  1. Design, build, and harden containers for NIM runtimes, inference backends; enable reproducible, multi-arch, CUDA-optimized builds.
  2. Develop Python tooling and services for build orchestration, CI/CD integrations, Helm/Operator automation, and test harnesses; enforce quality with typing, linting, and unit/integration tests.
  3. Help design and evolve Kubernetes deployment patterns for NIMs, including GPU scheduling, autoscaling, and multi-cluster rollouts.
  4. Optimize container performance: layer layout, startup time, build caching, runtime memory/IO, network, and GPU utilization; instrument with metrics and tracing.
  5. Evolve the base image strategy, dependency management, and artifact/registry topology.

Skills

Required

  • Computer Science, Computer Engineering, or related field degree (BS or MS) or equivalent experience
  • 6+ years building production software with a strong focus on containers and Kubernetes
  • Strong Python skills building production-grade tooling/services
  • Experience with Python SDKs and clients for Kubernetes and cloud services
  • Expert knowledge of Docker/BuildKit, containerd/OCI, image layering, multi-stage builds, and registry workflows
  • Deep experience operating workloads on Kubernetes
  • Hands-on experience building and running GPU workloads in k8s, including NVIDIA device plugin, MIG, CUDA drivers/runtime, and resource isolation
  • Excellent collaboration and communication skills; ability to influence cross-functional design

Nice to have

  • Expertise with Helm chart design systems, Operators, and platform APIs serving many teams
  • Experience with OpenAI API, Hugging Face API as well as understanding difference inference backends (vLLM, SGLang, TRT-LLM)
  • Background in benchmarking and optimizing inference container performance and startup latency at scale
  • Prior experience designing multi-tenant, multi-cluster, or edge/air-gapped container delivery
  • Contributions to open-source container, k8s, or GPU ecosystems

What the JD emphasized

  • building production software with a strong focus on containers and Kubernetes
  • building and running GPU workloads in k8s
  • Optimize container performance: layer layout, startup time, build caching, runtime memory/IO, network, and GPU utilization; instrument with metrics and tracing.
  • Expertise with Helm chart design systems, Operators, and platform APIs serving many teams.
  • Background in benchmarking and optimizing inference container performance and startup latency at scale.

Other signals

  • NVIDIA Inference Microservices (NIMs)
  • LLM inference
  • thousands of GPUs