Senior System Software Engineer – Embedded AI Inference

NVIDIA NVIDIA · Semiconductors · Munich, Germany

Senior Software Engineer to develop production automotive software for AI inference and agent orchestration in C++ for embedded platforms. Focus on building next-generation automotive software applications, including in-car agentic AI and inference of LLM/VLM/VLA models on NVIDIA GPUs.

What you'd actually do

  1. Design, implement, and maintain C++ agentic AI and AI inference solutions for embedded production platforms.
  2. Integrate PyTorch Deep Learning models into C++ pipelines, and deploy them for real-time inference on NVIDIA GPUs.
  3. Build and extend testable, modular libraries and components, including interfaces to models, sensor drivers, and vehicle control.
  4. Profile, debug, and optimize C++ and CUDA code to meet strict latency and throughput targets.
  5. Collaborate closely with ML researchers, systems engineers, and automotive partners to turn prototype algorithms into production-ready implementations.

Skills

Required

  • 8+ years of professional software engineering experience
  • Master's or PhD degree in Computer Science or Machine Learning
  • Strong modern C++ (C++14/17 or later)
  • Solid Python skills
  • Hands-on experience building agentic AI frameworks
  • Hands-on experience with LLM / VLM inference
  • Experience with LLM and VLM inference and related optimization techniques like speculative decoding, LoRA, MoE
  • Experience developing on Linux
  • Familiarity with GPU programming and optimization

Nice to have

  • high-performance safety-critical software
  • automotive
  • robotics
  • real-time systems
  • TensorRT
  • agentic AI, specifically agents based on edge-friendly models (2–7B)
  • context management
  • reliable tool calling
  • MCP
  • agentic coding
  • NVIDIA DRIVE AGX platform
  • quantization (INT8, FP8, 4-bit)
  • high-performance LLM inference frameworks like TensorRT-LLM or ONNX Runtime
  • software quality practices for safety-critical systems
  • automotive standards knowledge
  • open-source contributions
  • published work in AI, robotics, or GPU computing

What the JD emphasized

  • production automotive software
  • AI inference
  • agent orchestration
  • embedded production platforms
  • real-time inference on NVIDIA GPUs
  • agentic AI frameworks
  • LLM / VLM inference
  • optimization techniques
  • Linux
  • GPU programming and optimization
  • TensorRT

Other signals

  • production automotive software
  • AI inference
  • agent orchestration
  • embedded systems
  • real-time inference on NVIDIA GPUs