Deep Learning Engineer - LLM and Vlm Model Compression

NVIDIA NVIDIA · Semiconductors · Warsaw, Poland +3 · Remote

NVIDIA is seeking a Deep Learning Engineer with 8+ years of experience to build deep learning frameworks for LLM and VLM model compression. The role involves designing and implementing algorithms for pruning, NAS, and distillation, experimenting with model compression, and collaborating with researchers. Experience with PyTorch, LLM/VLM training or inference, and DL fundamentals are required. Experience with model compression techniques, building DL frameworks, and GPU programming are preferred. The role is based in Poland or Switzerland, with a salary range of 292,500 PLN - 650,000 PLN.

What you'd actually do

  1. Design and implement a deep learning framework for compressing large language and vision-language models to deliver highly optimized, high-performance AI systems used worldwide.
  2. Develop and integrate new algorithms for pruning, NAS, and distillation in collaboration with NVIDIA researchers and engineers.
  3. Experiment with compressing the latest LLMs and VLMs, analyzing their performance and behavior across diverse workloads.
  4. Collaborate with researchers and engineers across NVIDIA, providing guidance on improving the design, usability and performance of workloads.
  5. Lead best-practices for building, testing, and releasing DL software.

Skills

Required

  • Deep Learning
  • SW Development
  • LLM or VLM model training or inference
  • Python programming
  • PyTorch
  • problem solving
  • analytical skills
  • Algorithms
  • DL fundamentals

Nice to have

  • model compression techniques (pruning, NAS, distillation, quantization)
  • building deep learning frameworks
  • GPU programming (CUDA or OpenCL)
  • First-author publication in a top-tier deep learning or AI conference

What the JD emphasized

  • 8+ years of experience in Deep Learning and SW Development
  • Hands-on experience with LLM or VLM model training or inference
  • practical experience in PyTorch required

Other signals

  • LLM and VLM model compression
  • pruning, distillation, and neural architecture search (NAS)
  • deep learning frameworks
  • GPU clusters
  • enterprise-grade AI efficiency