Developer Technology Engineer, Energy

NVIDIA NVIDIA · Semiconductors · Zurich, Switzerland +5 · Remote

NVIDIA is seeking a Developer Technology Engineer to optimize Energy simulation and AI workflows on NVIDIA platforms, focusing on CUDA performance optimization for HPC/AI production workloads. The role involves hands-on work with customers and internal teams to deliver speedups and scalable performance on multi-GPU/multi-node systems.

What you'd actually do

  1. Profile, analyze, and optimize GPU-accelerated applications with emphasis on CUDA kernels, memory movement, concurrency, and end-to-end throughput.
  2. Drive performance improvements across the stack: CUDA C++ kernel optimization, launch configuration, memory hierarchy, streams/events; GPU libraries (as applicable): cuBLAS, cuFFT, cuSPARSE, cuSOLVER, NCCL; Multi-GPU and multi-node scaling using MPI + NCCL, CPU/GPU overlap, communication patterns
  3. Build reproducible benchmarks, performance reports, and tuning recommendations (before/after, methodology, scaling curves).
  4. Develop and maintain reference implementations, examples, and/or patches to customer code to enable performance and portability.
  5. Support customer engagements (POCs to production), including debugging correctness/performance issues and advising on best practices for deployment (containers, schedulers, clusters).

Skills

Required

  • C/C++
  • Python
  • Linux
  • CUDA programming
  • GPU performance optimization
  • NVIDIA Nsight Systems / Nsight Compute
  • parallel computing
  • performance fundamentals

Nice to have

  • MPI
  • distributed systems
  • multi-node performance tuning
  • Seismic processing pipelines
  • reservoir simulation
  • power grid simulation
  • CI/perf regression testing
  • containerized workflows
  • schedulers
  • AI workflows

What the JD emphasized

  • CUDA programming and GPU performance optimization concepts
  • profiling and debugging performance using tools such as NVIDIA Nsight Systems / Nsight Compute
  • 5+ years relevant experience in GPU/HPC optimization; strong track record of delivered speedups and scaling improvements

Other signals

  • CUDA performance optimization
  • GPU-accelerated applications
  • HPC/AI production workflows
  • multi-GPU and multi-node systems