Senior Hpc Performance Engineer

NVIDIA NVIDIA · Semiconductors · Germany +3 · Remote

NVIDIA is seeking a Senior HPC Performance Engineer to analyze and optimize the performance of their GPU communication libraries, which are critical for scaling Deep Learning and HPC applications. The role involves in-depth performance characterization, root-cause analysis of performance issues, and collaboration with cross-functional teams on large multi-GPU and multi-node clusters.

What you'd actually do

  1. Conduct in-depth performance characterization and analysis on large multi-GPU and multi-node clusters.
  2. Study the interaction of our libraries with all HW (GPU, CPU, Networking) and SW components in the stack
  3. Evaluate proof-of-concepts, conduct trade-off analysis when multiple solutions are available
  4. Triage and root-cause performance issues reported by our customers
  5. Collect a lot of performance data; build tools and infrastructure to visualize and analyze the information

Skills

Required

  • M.S. (or equivalent experience) or PHD in Computer Science, or related field with relevant performance engineering and HPC experience
  • 3+ yrs of experience with parallel programming and at least one communication runtime (MPI, NCCL, UCX, NVSHMEM)
  • Experience conducting performance benchmarking and triage on large scale HPC clusters
  • Good understanding of computer system architecture, HW-SW interactions and operating systems principles (aka systems software fundamentals)
  • Implement micro-benchmarks in C/C++, read and modify the code base when required
  • Ability to debug performance issues across the entire HW/SW stack.
  • Proficient in a scripting language, preferably Python
  • Familiar with containers, cloud provisioning and scheduling tools (Kubernetes, SLURM, Ansible, Docker)

Nice to have

  • Practical experience with Infiniband/Ethernet networks in areas like RDMA, topologies, congestion control
  • Experience debugging network issues in large scale deployments
  • Familiarity with CUDA programming and/or GPUs
  • Experience with Deep Learning Frameworks such PyTorch, TensorFlow

What the JD emphasized

  • performance engineering
  • HPC experience
  • parallel programming
  • performance benchmarking
  • systems software fundamentals
  • performance issues