Principal AI Developer Technology Engineer

NVIDIA NVIDIA · Semiconductors · Berlin, Germany +4 · Remote

This role focuses on researching and developing techniques to accelerate AI workloads (deep learning, machine learning) on advanced computer architectures, specifically GPUs. The engineer will perform in-depth analysis and optimization of complex AI and HPC algorithms, publish findings, and influence future hardware/software design. Requires deep C/C++ programming, parallel programming (CUDA, etc.), low-level performance optimization, and CPU/GPU architecture expertise.

What you'd actually do

  1. In this position, you will research and develop techniques to GPU accelerate workloads in deep learning, machine learning or other AI domains.
  2. Work directly with other technical experts in their fields (industry and academia) to perform in-depth analysis and optimization of complex AI and HPC algorithms to ensure optimal AI solutions on modern CPU and GPU architectures.
  3. Publish and/or present discovered optimization techniques in developer blogs or relevant conferences to engage and educate the developer community.
  4. Influence the design of next-generation hardware architectures, software, and programming models in collaboration with research, hardware, system software, libraries, and tools teams at NVIDIA.

Skills

Required

  • C/C++ programming
  • algorithms
  • software development
  • parallel programming (CUDA, OpenACC, OpenMP, MPI, pthreads)
  • low-level performance optimizations
  • CPU and GPU architecture fundamentals
  • communication skills
  • organization skills
  • logical approach to problem solving
  • time management
  • prioritization skills

Nice to have

  • parallelization and performance optimization of Deep Learning models (NLP, Computer Vision, Recommender Systems)
  • linear algebra

What the JD emphasized

  • 15+ years of relevant experience
  • low-level performance optimizations
  • CPU and GPU architecture fundamentals

Other signals

  • GPU acceleration
  • performance optimization
  • AI workloads
  • developer community engagement