Senior Systems Software Engineer, Performance Architecture - Analytics and Data Intelligence

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

Senior Systems Software Engineer focused on performance architecture for GPU-accelerated structured data processing, involving coordinated SQL and user-friendly interfaces across diverse CPU and GPU query engines. The role focuses on compiler and JIT-based execution techniques for cuDF and related analytics runtimes, improving performance, reliability, and workload optimization.

What you'd actually do

  1. Extend JIT and compiler-based execution support in cuDF and related
  2. GPU-accelerated structured data processing systems.
  3. Design approaches for lowering expressions, ASTs, or query fragments into optimized GPU execution paths.
  4. Investigate kernel fusion strategies across cuDF operations to reduce materialization, memory traffic, launch overhead, and end-to-end query latency.
  5. Analyze structured analytics workloads to identify performance bottlenecks in expression evaluation, joins, aggregations, scans, data movement, and memory management.

Skills

Required

  • Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field, or equivalent hands-on experience
  • 12+ years of validated experience in systems performance engineering or performance-focused architecture
  • Proven skills in profiling, instrumentation, and optimization for CPU and GPU systems, applying tools like tracing, counters, flame graphs, and kernel-level profiling
  • Experience with compiler, JIT, code generation, query execution, or runtime optimization techniques
  • Experience optimizing analytic database engines and/or query runtimes, including vectorized execution, join strategies, and columnar formats like Arrow and Parquet
  • Proficient in C++ and/or Python, with a strong ability to analyze performance-critical code and implement effective solutions
  • Experience with cuDF, RAPIDS, CUDA, Numba, LLVM, MLIR, NVRTC, or other JIT/codegen systems
  • Experience with benchmarking frameworks, performance dashboards, and CI/CD regression gating, along with a proven grasp of modern analytics and machine learning workflows

Nice to have

  • Deep familiarity with NVIDIA GPUs and GPU programming (CUDA), including memory hierarchy, concurrency, and profiling toolchains such as Nsight Systems
  • Experience with TPC-style benchmarking (TPC-H, TPC-DS, or analogous), Click-Bench-like workloads, and building credible, repeatable performance narratives
  • Prior work on database execution engines, especially operator fusion, query compilation, vectorized execution, or adaptive execution
  • Demonstrated open-source contributions to performance-critical systems, including libraries, runtimes, databases, and ML or data tooling

What the JD emphasized

  • 12+ years of validated experience in systems performance engineering or performance-focused architecture
  • Experience with compiler, JIT, code generation, query execution, or runtime optimization techniques
  • Experience optimizing analytic database engines and/or query runtimes, including vectorized execution, join strategies, and columnar formats like Arrow and Parquet
  • Experience with cuDF, RAPIDS, CUDA, Numba, LLVM, MLIR, NVRTC, or other JIT/codegen systems