Senior Developer Technology Engineer - Windows AI Platform

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

Senior Developer Technology Engineer focused on optimizing and deploying AI/GenAI applications on NVIDIA RTX platforms, particularly LLMs on Windows. This role involves working with internal teams and external developers, analyzing performance, conducting training, and improving user experience with OSS software like Llama.cpp and Ollama. Collaboration with driver and architecture teams is key to influencing future GPU features.

What you'd actually do

  1. Work closely with internal engineering and product teams and external app developers on solving local end-to-end AI GPU deployment challenges on the NVIDIA RTX AI platform.
  2. Apply powerful profiling and debugging tools for analyzing most demanding GPU-accelerated end-to-end AI applications to detect insufficient GPU utilization resulting in suboptimal runtime performance.
  3. Conduct hands-on trainings, develop sample code and host presentations to give good guidance on efficient end-to-end AI deployment targeting optimal runtime performance on NVIDIA ARM-based SoCs.
  4. Improve Windows LLM & GenAI user experience on NVIDIA RTX by working on feature and performance enhancements of OSS software, including but not limited to projects like GGML, Llama.cpp, Ollama, ONNX Runtime.
  5. Collaborate with GPU driver and architecture teams as well as NVIDIA research to influence next generation GPU features by providing real-world workflows and giving feedback on partner and customer needs.

Skills

Required

  • C/C++
  • Python
  • software design
  • programming techniques
  • Windows operating system development
  • open-source LLM and GenAI software experience
  • CUDA
  • NVIDIA Nsight GPU profiling and debugging suite
  • problem-solving skills
  • independent and collaborative work
  • interpersonal and communication skills

Nice to have

  • GPU-accelerated AI inference driven by NVIDIA APIs (cuDNN, CUTLASS, TensorRT)
  • Vulkan
  • DX12
  • latest generation GPU architectures
  • AI deployment on NPUs and ARM architectures

What the JD emphasized

  • local end-to-end AI GPU deployment challenges
  • optimal runtime performance
  • Windows LLM & GenAI user experience
  • feature and performance enhancements
  • real-world workflows
  • 8+ years of professional experience in local GPU deployment, profiling and optimization
  • Experience working with open-source LLM and GenAI software
  • Experience with CUDA and NVIDIA's Nsight GPU profiling and debugging suite

Other signals

  • deployment
  • optimization
  • performance
  • developer enablement