Senior Deep Learning Tools Engineer – Cuda Tile

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +5 · Remote

Senior Deep Learning Tools Engineer at NVIDIA focused on performance validation, analysis, and tracking for AI workloads accelerated by CUDA Tile compiler technologies and GPU systems. The role involves designing and developing performance testing frameworks, building automated CI/CD pipelines, implementing benchmarking systems, analyzing performance trends, and collaborating with compiler and architecture teams to resolve performance issues. Requires strong programming skills in Python, experience with CI/CD, deep learning frameworks, and hardware-aware performance analysis.

What you'd actually do

  1. Design and develop performance testing frameworks for deep learning compilers and workloads
  2. Build and maintain automated pipelines (CI/CD) to continuously track performance across models, hardware, and compiler changes
  3. Implement benchmarking systems to measure latency, throughput, and efficiency of AI and HPC workloads
  4. Analyze performance trends over time and identify regressions, bottlenecks, and optimization opportunities
  5. Partner with compiler and architecture teams to debug and resolve performance issues

Skills

Required

  • Python
  • CI/CD systems
  • automation frameworks
  • hardware-aware performance analysis
  • deep learning frameworks (PyTorch, TensorFlow, JAX, or TensorRT)
  • data analysis
  • profiling
  • regression tracking
  • system-level issues debugging

Nice to have

  • C++
  • GPU performance analysis and optimization
  • compiler internals (LLVM, MLIR, CUDA compilation flow)
  • performance dashboards
  • large-scale telemetry systems
  • hardware/software co-design
  • low-level performance tuning
  • distributed testing infrastructure
  • large-scale benchmarking systems

What the JD emphasized

  • performance validation
  • performance testing frameworks
  • automated pipelines
  • benchmarking systems
  • performance trends
  • performance issues
  • performance analysis
  • performance dashboards
  • performance tuning

Other signals

  • performance validation
  • compiler technologies
  • AI workloads
  • GPU systems
  • automation infrastructure