Deep Learning Performance Software Engineer

NVIDIA · Semiconductors · Shanghai, China +1

NVIDIA is seeking a Deep Learning Performance Software Engineer to develop GPU-accelerated deep learning software, focusing on optimizing deep learning kernels and end-to-end performance through tile-based GPU programming. The role requires strong C/C++ skills, GPU programming experience (CUDA or OpenCL), and performance modeling/optimization knowledge.

What you'd actually do

  1. Develop [TileGym](https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FNVIDIA%2FTileGym&data=05%7C02%7Chansonz%40nvidia.com%7C83be3dd28e384e6fbc0b08de4e831d7c%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C639034522226468646%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=x9Q5uP8QGlcyRPI9aLI%2BTdTBRJrij8F73IfJigKZiVc%3D&reserved=0), [Triton CUDA TileIR backend](https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ftriton-lang%2FTriton-to-tile-IR&data=05%7C02%7Chansonz%40nvidia.com%7C83be3dd28e384e6fbc0b08de4e831d7c%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C639034522226484792%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=RkglyiBYjXJ51TLuNkQn6fHIWSeREaKu%2FcvhUhfyDl0%3D&reserved=0) and [CUDA Tile](https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdeveloper.nvidia.com%2Fcuda%2Ftile&data=05%7C02%7Chansonz%40nvidia.com%7C83be3dd28e384e6fbc0b08de4e831d7c%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C639034522226497419%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=nxVVnjRdLFJVIyeBT3PnSzdOGhCjDoSxRTy34t4EyFQ%3D&reserved=0)
  2. Develop highly optimized deep learning kernels through tile-based GPU programming model
  3. End-to-end performance optimization through tile-based GPU programming model
  4. Do performance optimization, analysis, and tuning

Skills

Required

  • Masters or PhD or equivalent experience in relevant discipline (CE, CS&E, CS, AI)
  • Excellent C/C++ programming and software design skills
  • Performance modelling, profiling, debug, and code optimization or architectural knowledge of CPU and GPU
  • 3 years of relevant work experience

Nice to have

  • SW Agile skills
  • Python experience
  • MLIR experience
  • AI agent experience
  • GPU programming experience (CUDA or OpenCL)

What the JD emphasized

  • Excellent C/C++ programming and software design skills
  • GPU programming experience (CUDA or OpenCL) desired
  • 3 years of relevant work experience

Other signals

  • GPU-accelerated Deep learning software
  • Develop highly optimized deep learning kernels
  • End-to-end performance optimization