Senior Software Engineer - Nim Platform Sdk and Framework

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +3 · Remote

Senior Software Engineer to own and evolve the core NIM Platform SDK and microservice framework, powering NVIDIA Inferencing Microservices (NIM). Focus on high-performance systems programming, multi-cloud abstractions, and API framework development for production-ready AI inference at scale.

What you'd actually do

  1. Develop and advance the inference microservice framework: OpenAI-compatible API endpoints, inference backend integrations (vLLM, SGLang, TensorRT-LLM, Dynamo), middleware, observability instrumentation, and production hardening across cloud, on-prem, and Kubernetes environments.
  2. Architect significant new features in open-source codebases, shepherding them through project acceptance and into production.
  3. Build and optimize high-performance model download and caching pipelines across multiple cloud storage backends (NGC, HuggingFace, S3, GCS) - parallel transfers, integrity verification, and seamless multi-cloud operability.
  4. Implement the model profile and manifest system that ensures NIMs are optimized for every NVIDIA GPU platform - profile selection, validation, and multi-GPU configuration.
  5. Develop and refine cloud microservice patterns - service discovery, health checking, graceful degradation, API gateway integration, and end-to-end request lifecycle management - to ensure NIMs operate reliably at scale in diverse cloud deployment environments.

Skills

Required

  • BS or MS in Computer Science, Computer Engineering, or related field (or equivalent experience).
  • 8+ years of demonstrated experience developing performant microservice, cloud software and/or platform infrastructure roles.
  • Deep technical expertise in cloud-native microservice architecture, including service mesh, API gateways, load balancing, and distributed system build patterns.
  • Expertise in high-performance data pipelines with parallel I/O, caching strategies, and integrity verification across distributed storage systems.
  • Solid understanding of containerized application delivery using technologies such as Docker, Kubernetes, and Helm.
  • Understanding of application security principles, including secure coding practices, vulnerability mitigation, secrets management, and supply chain integrity for containerized environments.
  • Strong problem-solving skills grounded in first-principles reasoning and critical analysis.
  • Excellent programming skills in Python and Rust, with strong foundations in algorithms, development patterns, and software engineering principles.

Nice to have

  • Direct involvement in open-source inference backends such as vLLM, TRTLLM, or SGLang.
  • Direct involvement in disaggregated serving frameworks like NVIDIA Dynamo.
  • Experience building and operating production microservices at scale.
  • Deep knowledge of multi-cloud deployment strategies across AWS, GCP, Azure, and OCI.
  • Experience operating in regulated, air-gapped, or disconnected environments where strict security and compliance controls are required.

What the JD emphasized

  • production-ready AI inference at scale
  • deep systems engineering skills
  • building foundational platform libraries
  • deeply technical role
  • building core platforms that scale
  • deep software engineering challenges
  • high-performance systems programming
  • multi-cloud abstractions
  • API framework development
  • production-grade software
  • high-performance model download and caching pipelines
  • multi-cloud operability
  • high-quality code
  • test-driven development
  • agentic AI-assisted development
  • high engineering standards
  • container quality, security, and operability
  • direct involvement in open-source inference backends
  • direct involvement in disaggregated serving frameworks
  • building and operating production microservices at scale
  • deep knowledge of multi-cloud deployment strategies
  • operating in regulated, air-gapped, or disconnected environments where strict security and compliance controls are required

Other signals

  • NVIDIA NIM platform
  • AI inference at scale
  • production-grade software