Senior Accelerated Computing Architect

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA

This role focuses on optimizing the performance of NVIDIA GPUs for accelerated computing, scientific computing, machine learning, AI, datacenter, and automotive computing. The architect will analyze and optimize parallel algorithms, data structures, and reference codes, collaborate with hardware and software teams, and dive into accelerated computing applications for co-design. The role involves writing white papers and publications, and requires strong mathematical fundamentals, GPU programming experience (CUDA/OpenCL), and C/C++ proficiency.

What you'd actually do

  1. Performing in-depth analysis and optimization to ensure the best possible performance on current and/or next-generation NVIDIA GPUs.
  2. Creating and optimizing core parallel algorithms, data structures, and reference codes to provide the best possible solutions for NVIDIA GPUs.
  3. Understanding and analyzing the interplay of hardware and software architectures on core algorithms, programming models, and applications.
  4. Actively collaborating with the hardware design, software engineering, product, and research teams to guide the direction of accelerated computing.
  5. Diving into accelerated computing applications to facilitate software-hardware co-design.

Skills

Required

  • MS or Ph.D. in Computer Science, Computer Engineering or Electrical Engineering, or equivalent experience
  • 6+ years of relevant work experience
  • Strong mathematical fundamentals, including linear algebra and numerical methods.
  • Hands-on experience with the massively parallel GPU programming model, e.g. CUDA or OpenCL.
  • Strong knowledge of C and C++ with solid understanding of software design, programming techniques, and algorithms.
  • Experience benchmarking, profiling characterizing workloads on GPU and CPU clusters.

Nice to have

  • Familiarity with APIs for multi-node communication, like MPI or OpenSHMEM/NVSHMEM
  • Familiarity with threading APIs for multicore CPUs and Unix-style Inter-process Communication (IPC) APIs
  • Familiarity with Python

What the JD emphasized

  • performance optimization
  • NVIDIA GPUs
  • accelerated computing