Developer Technology Engineer – AI

NVIDIA NVIDIA · Semiconductors · Shanghai, China +2

NVIDIA Developer Technology Engineer focused on optimizing core parallel algorithms and data structures for GPUs, collaborating with application developers and internal NVIDIA teams to improve application performance and developer efficiency. Requires strong programming skills in C/C++/Python, parallel programming experience (CUDA), and mathematical fundamentals.

What you'd actually do

  1. Working directly with key application developers to understand the current and future problems they are solving, crafting and optimizing core parallel algorithms and data structures to provide the best solutions using GPUs, through both reference code development and direct contribution to the applications.
  2. Collaborating closely with diverse groups at NVIDIA such as the architecture, research, libraries, tools, and system software teams to influence the design of next-generation architectures, software platforms, and programming models, by investigating the impact on application performance and developer efficiency.
  3. Need to travel from time to time for conferences and for on-site visits with developers.

Skills

Required

  • BS, MS, or PhD degree in engineering or computer science related field (or equivalent experience)
  • 3 years work experience
  • Programming proficiency in C/C++ and/or Python
  • Deep understanding of software design, programming techniques, and algorithms
  • Strong mathematical fundamentals, including linear algebra and numerical methods
  • Experience with parallel programming, ideally CUDA C/C or OpenACC
  • Strong communication and organization skills
  • Logical approach to problem solving
  • Good time management and task prioritization skills

Nice to have

  • domain expertise in telecommunications
  • domain expertise in medical imaging
  • domain expertise in machine learning
  • domain expertise in deep learning
  • domain expertise in HPC
  • domain expertise in natural sciences

What the JD emphasized

  • optimizing core parallel algorithms
  • reference code development
  • direct contribution to the applications
  • impact on application performance
  • developer efficiency
  • machine learning
  • deep learning
  • parallel programming
  • CUDA C/C

Other signals

  • optimizing core parallel algorithms
  • reference code development
  • direct contribution to applications
  • impact on application performance
  • developer efficiency
  • machine learning
  • deep learning
  • HPC
  • parallel programming
  • CUDA C/C