Senior Deep Learning Test Development Engineer, Sdet

NVIDIA NVIDIA · Semiconductors · Shanghai, China

Senior Deep Learning Test Development Engineer (SDET) at NVIDIA's AI SWQA team, responsible for validating the robustness and performance of NVIDIA's AI software and GPU Infrastructure across various AI scenarios. The role involves test planning, design, execution, automation, and bug management, with a focus on improving workflow processes and efficiency. Experience with LLM inference frameworks and AI development tools is required.

What you'd actually do

  1. Work closely with global cross-functional teams to understand the test requirements and take ownership of product quality.
  2. Plan/design/execute/report/automate test plan/test case/test reports.
  3. Manage bug lifecycle and co-work with inter-groups to drive for solutions.
  4. Automate test cases and assist in the architecture, crafting and implementing of test frameworks.
  5. In-house repro and verify customer issues/fixes.

Skills

Required

  • BS or higher degree in CS/EE/CE or equivalent
  • 5+ years of software quality assurance or test automation background
  • Scripting language (Python, Bash) knowledge
  • UNIX/Linux experience
  • Python software development or test development experience
  • Virtualization experience (VM & Docker container & k8s)
  • Excellent English written and oral communication skills
  • Multiple GPUs P2P workload developing/testing
  • Experience with LLM inference frameworks (TRT-LLM, vLLM, SGLang, etc.)
  • Familiar with running various AI workloads
  • Experience with AI tools for coding (like Cursor, Gemini, NotebookLM)

Nice to have

  • Familiarity with NVIDIA GPU hardware products (Tesla, Tegra, DGX, etc.)
  • Familiarity with multiple GPUs tools usage (NCCL / NIXL)
  • Understanding and working knowledge with any Deep Learning Framework especially in end-to-end customer scenarios.
  • Working knowledge of NVIDIA GPU Computing (CUDA) and CUDA libraries for Deep Learning like cuDNN
  • Experience in VectorCAST, Bullseye, Gcov, or Coverity tools.

What the JD emphasized

  • 5+ years of software quality assurance or test automation background
  • Proven success in leveraging AI (development) tools to significantly improve efficiency, streamline workflows, enhance process automation, create test cases and increase code coverage.
  • Experience with LLM inference frameworks (TRT-LLM, vLLM, SGLang, etc.) and familiar with running various AI workloads

Other signals

  • AI software and GPU Infrastructure validation
  • test automation
  • LLM inference frameworks