Senior Performance Engineer

NVIDIA NVIDIA · Semiconductors · Yokneam, Israel

Senior Performance Engineer at NVIDIA focusing on optimizing AI and HPC workloads on GPU/CPU clusters. Responsibilities include profiling, benchmarking, identifying bottlenecks, and developing performance analysis tools, with a strong emphasis on high-performance networking and telemetry.

What you'd actually do

  1. Profile, benchmark, and analyze AI and HPC workloads on GPU and CPU clusters
  2. Explore performance characteristics of high-performance networking and collective communications (e.g., NCCL, RDMA, MPI, RoCE)
  3. Identify performance bottlenecks across networking, compute, memory, and system architecture
  4. Develop and enhance performance analysis, benchmarking, and diagnostic tools
  5. Define performance test plans and establish expectations for new technologies and platforms

Skills

Required

  • performance analysis
  • systems engineering
  • HPC/AI infrastructure
  • performance analysis skills and methodologies
  • high-performance networking (RDMA, MPI, NCCL, congestion control)
  • system performance metrics (latency, throughput, resource utilization)
  • hardware, firmware, or embedded telemetry environments
  • analytical, problem-solving, and communication skills
  • cross-functional, fast-paced R&D teams

Nice to have

  • CUDA
  • NCCL internals
  • congestion control algorithms
  • system-level understanding of CPU architectures, GPUs, HCAs, memory, and PCIe
  • NVIDIA GPUs
  • CUDA
  • deep learning frameworks (PyTorch or TensorFlow)
  • cloud platforms
  • Python
  • Bash
  • C/C++
  • Linux environments

What the JD emphasized

  • performance analysis
  • AI and HPC infrastructure
  • performance analysis skills and methodologies
  • high-performance networking
  • system performance metrics
  • hardware, firmware, or embedded telemetry environments
  • performance analysis

Other signals

  • performance analysis
  • AI and HPC workloads
  • GPU and CPU clusters
  • high-performance networking
  • telemetry collection