Senior Deep Learning Compiler Verification Engineer

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +4 · Remote

NVIDIA is seeking a Senior Deep Learning Compiler Verification Engineer to design and build systems for verifying correctness in deep learning compilers, focusing on graph transformations, IR lowering, and GPU execution. The role involves analyzing and validating optimizations, engineering test generation systems using deep learning solutions, and defining quality metrics for evolving models, compiler stacks, and hardware.

What you'd actually do

  1. Design and build systems to reason about correctness in deep learning compilers, across graph transformations, IR lowering, and GPU execution
  2. Work with deep learning compiler and architecture teams to analyze and validate sophisticated optimizations (e.g., graph rewrites in MLIR, fusion passes, mixed-precision transformations), ensuring they preserve semantics and numerical behavior
  3. Engineer test generation systems that use deep learning solutions and analysis methods to drive in-depth testing. These systems explore the vast combinatorial space of model topologies, precision modes, and hardware targets.
  4. Define and improve how we measure and guarantee functional quality and performance as models, compiler stacks, and hardware continue to evolve

Skills

Required

  • Python
  • C++
  • Deep learning frameworks (PyTorch, JAX/XLA, TensorRT)
  • Compiler development
  • Deep learning systems
  • Compiler verification
  • Model execution
  • Graph representation
  • Runtime behavior

Nice to have

  • LLVM
  • MLIR
  • TVM
  • XLA
  • Formal methods
  • Type systems
  • Program semantics
  • Proof-based verification
  • Quantization
  • Operator fusion
  • Mixed-precision
  • Graph-level optimization

What the JD emphasized

  • Must have deep proficiency in Python or C++ and experience with one major DL framework.
  • Strong systems intuition and debugging depth — ability to reason across abstraction layers, from high-level model semantics down to generated code, and track down failures that only manifest in edge cases!

Other signals

  • compiler verification
  • deep learning workloads
  • MLIR
  • GPU execution