Senior GPU Communications Libraries Test Development Engineer, Sdet

NVIDIA NVIDIA · Semiconductors · Shanghai, China

This role focuses on testing and automating tests for NVIDIA's GPU Communications Libraries, which are foundational for high-performance computing and AI workloads. While not directly building AI models, the role requires solid experience with AI development tools and deep learning frameworks to ensure the quality of the underlying infrastructure.

What you'd actually do

  1. Be responsible for running test cases to validate NVIDIA GPU Communications Libraries (NCCL, NVSHMEM, UCX, GDRCopy, GPUDirect RDMA etc).
  2. Be responsible to automate test cases and maintain the automation scripts.
  3. Collaborate with Developer, PM, marketing, and engineering teams on crafting test plan and implementing validation.
  4. You will assist in the architecture, crafting and implementing of SWQA test frameworks.
  5. Be responsible for code coverage improvement and code complexity optimization.

Skills

Required

  • BS or higher degree in CS/EE/CE or equivalent experience
  • 5+ years of relevant experience
  • Seasoned software QA or software testing background; test infrastructure and strong analysis skills
  • Be proficient in scripting language (Python, Perl, bash)
  • Solid experience with AI development tools for test development and automation
  • Knowledge of basic networking concepts
  • UNIX/Linux experience is required
  • Experiences in C/C++ is required
  • Ability to work independently and leadership skills as well as experience in using quality mindset to drive improvements
  • Proficient oral and written English

Nice to have

  • Experience with CUDA programming and NVIDIA GPUs
  • Knowledge of high-performance networks like InfiniBand, RoCE, etc
  • Experience with CSPs (AWS, Google Cloud, Oracle Cloud Infrastructure, Microsoft Azure), and HPC cluster, slurm, ansible, etc
  • Prior experience with virtualization technologies (KVM, HyperV, VMWARE, OpenStack, Docker, Kubernetes)
  • Experience with Deep Learning Frameworks such as PyTorch, TensorFlow, etc

What the JD emphasized

  • Experiences in C/C++ is required
  • Solid experience with AI development tools for test development and automation
  • UNIX/Linux experience is required