Senior Systems Software Engineer, Lpu

NVIDIA NVIDIA · Semiconductors · Toronto, ON +1 · Remote

Senior Systems Software Engineer role focused on building foundational software for NVIDIA's LPU (likely a hardware accelerator) to enable high-performance computing platforms. Responsibilities include developing and maintaining hardware abstraction layers, core system libraries, drivers, and runtime interfaces. The role requires strong C++ skills, low-level system software experience, and Linux systems knowledge, with a focus on reliability and operability.

What you'd actually do

  1. Extend and maintain hardware abstraction layers and core system libraries used across the platform.
  2. Design and implement drivers, runtimes, and data movement/aggregation pipelines supporting workload execution.
  3. Build and maintain runtime interfaces for launching, monitoring, and managing workloads.
  4. Improve platform reliability through automation, error reporting, diagnostics, and operational tooling.
  5. Debug and resolve complex sequencing, initialization, and runtime issues across multi-component systems.

Skills

Required

  • Masters Degree in Computer Science, Computer Engineering, Electrical Engineering, related STEM field or equivalent experience
  • 5+ years of relevant work experience
  • Modern C++
  • Designing, maintaining, and refactoring software libraries and APIs
  • Large, multi-repository or multi-component codebases
  • Triage of difficult reliability issues and root-cause analysis
  • Low-level platform software experience (e.g., firmware/boot flows, RTOS, BMCs/MCUs, RISC-V, or closely related system software)
  • Linux systems experience (e.g., VFIO or similar subsystems)
  • Hardware bring-up and/or system triage experience

Nice to have

  • Distributed systems experience (e.g., MPI, gRPC, RPC frameworks, coordination/telemetry patterns)
  • Experience with inference systems and token serving (e.g., vLLM or similar serving/runtime stacks)
  • Experience shipping and supporting customer-facing SDKs
  • Production readiness and delivery experience (e.g., CI/CD and release workflows, monitoring/alerting practices, Kubernetes and/or data center operational workflows)

What the JD emphasized

  • modern C++
  • low-level platform software experience
  • Linux systems experience
  • Hardware bring-up and/or system triage experience