Senior System Software Engineer - Dynamo-triton Inference Server

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +2 · Remote

Senior System Software Engineer to work on Dynamo-Triton Inference Server, a GPU-accelerated AI inference serving platform. The role involves developing high-performance inference software, contributing to feature development, driving customer adoption, and optimizing throughput and latency for both LLM and non-LLM workloads.

What you'd actually do

  1. Develop world-class GPU-accelerated AI inference serving software.
  2. Contribute to feature development and drive broad customer adoption.
  3. Drive the convergence of the Triton Inference Server and NVIDIA Dynamo stacks to establish a unified, high-performance inference platform.
  4. Be an active member of the open source deep learning software engineering community.
  5. Balance a variety of objectives such as building robust software designed to be deployed in production server or cloud environments, optimizing and balancing prediction throughput and latency, and developing and adopting the next generation of inference technologies.

Skills

Required

  • MS or PhD in Computer Science or relevant field (or equivalent experience).
  • 5+ years of professional experience working on deep learning software.
  • Excellent Rust & C++ skills, familiarity with Python, and strong programming & software design skills including debugging, performance analysis, and test design.
  • Experience with high-scale distributed systems and ML systems.
  • Strong communication skills and ability to work in a fast-paced, agile team environment.

Nice to have

  • Prior experience with AI frameworks and engines, such as TensorRT, PyTorch, ONNX, OpenVINO, vLLM, or TRT-LLM.
  • Knowledge of GPU memory management, cache management, or high-performance networking.
  • Experience with distributed systems programming.
  • Experience in contributing to a large open source project: use of GitHub, bug tracking, branching and merging code, OSS licensing issues handling patches, etc.

What the JD emphasized

  • 5+ years of professional experience working on deep learning software.
  • Excellent Rust & C++ skills
  • Experience with high-scale distributed systems and ML systems.

Other signals

  • Develop world-class GPU-accelerated AI inference serving software.
  • Drive the convergence of the Triton Inference Server and NVIDIA Dynamo stacks to establish a unified, high-performance inference platform.
  • Balance a variety of objectives such as building robust software designed to be deployed in production server or cloud environments, optimizing and balancing prediction throughput and latency, and developing and adopting the next generation of inference technologies.