Senior Developer Technology Engineer

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +2 · Remote

This role focuses on researching and developing techniques to accelerate top CSP workloads on NVIDIA's computing platform, including advanced CPUs, GPUs, and interconnects. The engineer will work directly with key customers to analyze and optimize complex workloads, collaborate with various NVIDIA teams to influence future hardware and software design, and publish findings. The role requires strong programming skills in C/C++, parallel programming (CUDA C/C++), low-level performance optimizations, and expertise in CPU/GPU architecture.

What you'd actually do

  1. In this role, you will research and develop techniques to accelerate top CSP workloads on NVIDIA’s computing platform including advanced CPUs, GPUs and interconnects.
  2. Work directly with key customers to perform in-depth analysis and optimization of complex workloads to ensure the best possible performance on current and next-generation hardware.
  3. Collaborate with libraries, tools, system software architecture, hardware, and research teams at NVIDIA to influence the design of next-generation programming models, software, and architectures.

Skills

Required

  • Masters degree in Computer Science, Computer Engineering, or related computationally focused science degree (or equivalent experience)
  • 8+ years of relevant work experience or research
  • Programming proficiency in C/C++
  • Deep understanding of software design, programming techniques, and algorithms
  • Parallel programming experience
  • CUDA C/C++ experience
  • Low-level performance optimizations experience
  • CPU and GPU architecture fundamentals expertise
  • Strong math skills, including linear algebra
  • Communication, organization and prioritization skills

Nice to have

  • Designed highly optimal parallel algorithms and data structures for applications with high bytes to compute ratio, such as processing directly on compressed data and kernel fusion
  • Optimized end-to-end performance of applications spanning many layers of software, from OS to high-level frameworks
  • Influenced hardware feature design leveraging application and domain knowledge

What the JD emphasized

  • 8+ years of relevant work experience or research
  • Programming proficiency in C/C++ with a deep understanding of software design, programming techniques, and algorithms
  • A background that includes parallel programming, ideally CUDA C/C++
  • Hands on experience doing low-level performance optimizations
  • In-depth expertise with CPU and GPU architecture fundamentals