Senior Deep Learning Compiler Engineer - Pytorch

NVIDIA · Semiconductors · Berlin, Germany +4

Senior Deep Learning Compiler Engineer to develop and optimize PyTorch models for NVIDIA GPUs using compiler technology like Thunder, TorchDynamo, and TorchInductor. Focus on performance analysis and contributing to open-source AI ecosystem.

What you'd actually do

  1. lead the design, implementation, optimization, and maintenance of the core compiler technologies that accelerate massive deep learning workloads
  2. dive deep into performance analysis, scrutinizing workloads running on thousands of GPUs to find optimization opportunities that will shape the future design of Thunder
  3. working closely with leading compiler, library, and systems teams—including experts behind nvFuser, TVM, XLA, and CUDA—to translate the latest research into practical, high-impact solutions for the open-source community
  4. contributing directly to the future of accelerated AI
  5. work alongside the very engineers who built PyTorch for NVIDIA hardware, helping to pioneer new features and stay at the forefront of framework development

Skills

Required

  • Python
  • deep learning frameworks (PyTorch or JAX)
  • compiler concepts (ASTs, intermediate representations, program analysis, code generation)
  • software systems development
  • communication and collaboration

Nice to have

  • contributions to deep learning compiler projects (TVM, MLIR, IREE)
  • PyTorch compiler stack (TorchDynamo, TorchInductor)
  • JAX-like functional transformations
  • parallel programming
  • distributed systems
  • high-performance CUDA code
  • open-source community participation

What the JD emphasized

  • 8+ years of relevant work experience
  • A strong command of Python and experience building complex, well-tested software systems
  • Hands-on experience with deep learning frameworks like PyTorch or JAX
  • A solid foundation in compiler concepts such as abstract syntax trees (ASTs), intermediate representations (e.g., SSA form), program analysis, and code generation

Other signals

  • compiler technology
  • PyTorch
  • NVIDIA GPUs
  • performance optimization