What you'd actually do

Lead architecture and technical strategy for optimizing inference workloads in autonomous driving applications.

Drive end-to-end performance analysis across DNN models, TensorRT/compiler flows, CUDA kernels, memory behavior, scheduling, runtime services, and automotive platform constraints.

Develop and guide model optimization techniques such as quantization, pruning, distillation, graph optimization, operator fusion, kernel selection, and layout/memory optimization.

Collaborate with TensorRT, CUDA, compiler, silicon architecture, perception, planning, DriveOS and safety platform teams.

Build tools, methodologies, and metrics for profiling, benchmarking, debugging, and validating model and platform performance.

Skills

Required

BS, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience).
12+ years of software engineering experience in systems software, AI/ML infrastructure, deep learning inference, compiler/runtime technology, or platform performance.
Strong C/C++ and practical Python experience.
Deep familiarity with TensorRT, TensorRT-LLM, ONNX, PyTorch, CUDA, Triton, or related frameworks.
Experience optimizing DNN models for latency, throughput, memory footprint, and power.

Nice to have

Hands-on experience with TensorRT internals, CUDA kernels, Triton kernels, or other compiler/runtime technologies.
Experience deploying optimized DNNs, LLMs, VLMs, or perception models on embedded, edge, robotics, or automotive platforms.
Background in autonomous driving, ADAS, robotics, real-time systems, safety-aware software, or deterministic low-latency systems.
Experience with ISO 26262, QNX, Safe RTOS, DriveOS, Linux, hypervisors, or virtualization.

Our Automotive Platform Team is building the software foundation for scalable, high-performance vehicle computing platforms that power autonomous driving, ADAS, digital cockpit, and centralized vehicle architectures. We are looking for exceptional engineers who thrive on solving deeply complex system-level challenges and shaping the future of automotive computing.

We are seeking a Senior Software Engineer for next-generation innovations in automotive platform performance, AI model optimization, scalability, and system architecture! In this highly visible technical leadership role, you will drive architecture, optimization, and execution across the autonomous driving software stack, with a focus on optimizing and deployment of deep neural networks that are fast, efficient, reliable, and deployable on NVIDIA automotive compute platforms. You will work at the intersection of core platform, deep learning inference, TensorRT and related compiler/runtime technologies, CUDA/GPU performance, model compression, platform software, and safety-aware automotive deployment.

What you'll be doing:

Lead architecture and technical strategy for optimizing inference workloads in autonomous driving applications.
Drive end-to-end performance analysis across DNN models, TensorRT/compiler flows, CUDA kernels, memory behavior, scheduling, runtime services, and automotive platform constraints.
Develop and guide model optimization techniques such as quantization, pruning, distillation, graph optimization, operator fusion, kernel selection, and layout/memory optimization.
Collaborate with TensorRT, CUDA, compiler, silicon architecture, perception, planning, DriveOS and safety platform teams.
Build tools, methodologies, and metrics for profiling, benchmarking, debugging, and validating model and platform performance.

What we need to see:

BS, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience).
12+ years of software engineering experience in systems software, AI/ML infrastructure, deep learning inference, compiler/runtime technology, or platform performance.
Strong C/C++ and practical Python experience.
Deep familiarity with TensorRT, TensorRT-LLM, ONNX, PyTorch, CUDA, Triton, or related frameworks.
Experience optimizing DNN models for latency, throughput, memory footprint, and power.

Ways to stand out from the crowd:

Hands-on experience with TensorRT internals, CUDA kernels, Triton kernels, or other compiler/runtime technologies.
Experience deploying optimized DNNs, LLMs, VLMs, or perception models on embedded, edge, robotics, or automotive platforms.
Background in autonomous driving, ADAS, robotics, real-time systems, safety-aware software, or deterministic low-latency systems.
Experience with ISO 26262, QNX, Safe RTOS, DriveOS, Linux, hypervisors, or virtualization.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 224,000 USD - 356,500 USD.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until June 30, 2026.

This posting is for an existing vacancy.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.