Principal Software Engineer - Dynamo

NVIDIA · Semiconductors · Santa Clara, CA +1 · Remote

Principal Software Engineer for NVIDIA Dynamo, an open-source platform for efficient, scalable inference of large language and reasoning models in distributed GPU environments. Focuses on Kubernetes serving, scalability, disaggregated serving, dynamic GPU scheduling, intelligent routing, and distributed KV cache management.

What you'd actually do

  1. Collaborate on the design and development of the Dynamo Kubernetes stack.
  2. Introduce new features to the Dynamo Python SDK and Dynamo Rust Runtime Core Library.
  3. Design, implement, and optimize distributed inference components in Rust and Python.
  4. Contribute to the development of disaggregated serving for Dynamo-supported inference engines (vLLM, SGLang, TRT-LLM, llama.cpp, mistral.rs).
  5. Improve intelligent routing and KV-cache management subsystems.

Skills

Required

  • BS/MS or higher in computer engineering, computer science or related engineering (or equivalent experience)
  • 15+ years of proven experience in related field
  • Strong proficiency in systems programming (Rust and/or C++)
  • Experience with Python for workflow and API development
  • Experience with Go for Kubernetes controllers and operators development
  • Deep understanding of distributed systems, parallel computing, and GPU architectures
  • Experience with cloud-native deployment and container orchestration (Kubernetes, Docker)
  • Experience with large-scale inference serving, LLMs, or similar high-performance AI workloads
  • Background with memory management, data transfer optimization, and multi-node orchestration
  • Familiarity with open-source development workflows (GitHub, continuous integration and continuous deployment)
  • Excellent problem-solving and communication skills

Nice to have

  • Prior contributions to open-source AI inference frameworks (e.g., vLLM, TensorRT-LLM, SGLang)
  • Experience with GPU resource scheduling, cache management, or high-performance networking
  • Understanding of LLM-specific inference challenges, such as context window scaling and multi-model agentic workflows

What the JD emphasized

  • 15+ years of proven experience in related field
  • Strong proficiency in systems programming (Rust and/or C++)
  • Experience with large-scale inference serving, LLMs, or similar high-performance AI workloads
  • Understanding of LLM-specific inference challenges, such as context window scaling and multi-model agentic workflows

Other signals

  • Distributed GPU environments
  • High-performance AI inference
  • Scalable AI systems
  • Kubernetes deployment
  • LLM frameworks
  • Disaggregated serving
  • GPU resource management
  • Intelligent routing
  • KV cache management