Senior Systems Software Engineer - Deep Learning Solutions

NVIDIA NVIDIA · Semiconductors · Toronto, ON +1 · Remote

Senior Systems Software Engineer focused on deep learning inference optimization for autonomous vehicles and robotics on edge hardware. The role involves analyzing and improving deep learning models on NVIDIA platforms, benchmarking performance, evaluating emerging model architectures, and collaborating with compiler, runtime, and hardware teams to deliver inference solutions.

What you'd actually do

  1. Address customer and partner optimization challenges: Engage directly with prominent automotive OEMs and robotics associates to analyze, debug, and improve their deep learning models on NVIDIA platforms. We emphasize delivering solutions rather than just recommendations.
  2. Own performance benchmarking: Drive efforts to achieve leading results on MLPerf Edge and industry benchmarks, as well as closed-source engagements with key partners. Define methodology, ensure reproducibility, and turn results into actionable optimization priorities.
  3. Evaluate emerging model architectures: Analyze new DL architectures, including vision encoders, multi-modal VLMs, hybrid SSM-Transformer backbones, diffusion/flow matching decoders, and multi-camera tokenizers, for compilation feasibility, memory footprint, and latency on target SOCs.
  4. Collaborate across teams: Partner with our compiler, runtime, and hardware teams to connect model-level insight with platform capabilities.
  5. Contribute to build reviews and help develop internal roadmap priorities based on real customer workload patterns.

Skills

Required

  • Master’s degree or equivalent experience in Computer Science, Electrical Engineering, or a related field
  • 12 + years of industry experience with over 8 years in deep learning model optimization, inference engineering, or neural network compilation
  • Adept at interpreting and reasoning about model architectures at the operator/kernel level
  • Over 5 years of validated expertise in embedded/edge software
  • Deep knowledge of current DL architectures: transformers, attention variants, vision encoders (ViT), multi-modal/vision-language model frameworks, and experience with diffusion models and/or state space models
  • Expert knowledge of GPU architecture fundamentals, CUDA, and low-level performance optimization using heterogeneous computing
  • Experience with TensorRT, compiler IRs, or equivalent inference optimization toolchains
  • Solid understanding of embedded operating system internals (QNX/Linux), memory management, C/C++, and embedded/system software concepts
  • Background in parallel programming (e.g., CUDA, OpenMP)
  • Experience reasoning about memory hierarchies, data movement, and compute utilization
  • Demonstrated capability to collaborate directly with external partners and customers in a deep technical role, solving their workload issues, identifying performance problems, and providing solutions within production limitations

Nice to have

  • Experience with ML compiler frameworks (TVM, MLIR, XLA, Triton)
  • Contributing to inference runtime development
  • Production deployment experience with autonomous vehicle perception or planning stacks
  • Familiarity with the Physical AI model landscape: VLM + action expert architectures, end-to-end driving models, or robot foundation models
  • Contributions to MLPerf benchmarks and large-scale industry performance optimization efforts
  • Experience with automotive safety standards (ISO 26262, SOTIF) and their implications for inference system development
  • Experience leading technical initiatives across globally distributed engineering teams

What the JD emphasized

  • deep learning model optimization
  • inference engineering
  • neural network compilation
  • production inference solutions
  • power-limited, latency-sensitive deployment environments
  • low-level performance optimization
  • production limitations

Other signals

  • Deep learning inference optimization
  • Autonomous vehicles and robotics on edge hardware
  • Inspect model architectures down to the operator level
  • Evaluate how modern architectures function on GPU and SOC
  • Deliver inference solutions on Jetson, DRIVE, and GPU + ARM platforms