Senior Deep Learning Software Engineer, Inference

NVIDIA · Semiconductors · Netherlands +2 · Remote

Senior Software Engineer specializing in Deep Learning Inference, focusing on optimizing GPU-accelerated software for large-scale model serving and inference using frameworks like SGLang and vLLM. The role involves performance tuning, implementing latest algorithms, and scaling performance across NVIDIA accelerators.

What you'd actually do

  1. Performance optimization, analysis, and tuning of DL models in various domains like LLM, Multimodal and Generative AI.
  2. Scale performance of DL models across different architectures and types of NVIDIA accelerators.
  3. Contribute features and code to NVIDIA’s inference libraries, vLLM and SGLang, FlashInfer and LLM software solutions.
  4. Work with cross-collaborative teams across frameworks, NVIDIA libraries and inference optimization innovative solutions.

Skills

Required

  • Masters or PhD or equivalent experience in relevant field (Computer Engineering, Computer Science, EECS, AI).
  • 5+ years of relevant software development experience.
  • Excellent C/C++ programming and software design skills.

Nice to have

  • SW Agile skills are helpful
  • Python experience is a plus.
  • Prior experience with training, deploying or optimizing the inference of DL models in production is a plus.
  • Prior background with performance modeling, profiling, debug, and code optimization or architectural knowledge of CPU and GPU is a plus.
  • Contribute to Deep Learning Software projects, such as PyTorch, vLLM, and SGLang to drive advancements in the field.
  • Experience with Multi-GPU Communications (NCCL, NVSHMEM)
  • Experience building and shipping products to enterprise customers.
  • GPU programming experience (CUDA, OAI TRITON or CUTLASS).

What the JD emphasized

  • optimize the GPU-accelerated software
  • high-performance deep learning frameworks
  • efficient large-scale model serving and inference
  • improving these platforms
  • smooth deployment and serving
  • performance improvements
  • model serving pipelines
  • Performance optimization, analysis, and tuning
  • Scale performance
  • inference libraries
  • inference optimization

Other signals

  • optimize inference performance
  • GPU-accelerated software
  • large-scale model serving