Senior Architect - Server Performance

NVIDIA NVIDIA · Semiconductors · Bangalore, India +3

NVIDIA is seeking architects to drive architectural performance for its next-generation AI server systems. This position demands a unique capability to bridge deep architectural knowledge, workload analysis, and hands-on silicon investigations. Candidates should be adept at working directly with silicon, high-level models, and simulators. Responsibilities include conducting performance investigations on both NVIDIA and competitive platforms, and developing targeted microbenchmarks to examine specific architectural aspects. The role does not heavily involve modeling tasks (functional or performance), though occasional focused assignments may arise.

What you'd actually do

  1. Analyze workloads of interest on existing silicon, with an emphasis on at-scale AI workloads, and high-performance computing (HPC) applications.
  2. Collaborate with cross-functional teams to define performance metrics and key use-case scenarios, then develop robust tests and benchmarking methodologies.
  3. Conduct comprehensive performance evaluations, identify bottlenecks, and recommend effective solutions using appropriate tools and platforms.
  4. Utilize insights from workload analysis and silicon studies to propose architectural features that optimize system performance and scalability.
  5. Work closely with software and hardware teams to influence design choices that impact overall system performance.

Skills

Required

  • Bachelor’s or Master’s degree in a relevant field
  • 10+ years of practical experience in hardware architecture across areas such as CPU, GPU, cache, memory subsystem, PCIe, networking, or storage.
  • Expertise in high-performance networking technologies, including InfiniBand and RoCE, or a strong familiarity with communication libraries like MPI and UCX.
  • Alternatively, a proven track record in performance optimizations for deep learning training or inference systems, high-performance computing, or cloud computing environments.
  • Expertise in benchmarking tools and methodologies, with demonstrated skill in developing and implementing targeted microbenchmarks.
  • Solid understanding of performance analysis tools and techniques
  • Proficiency in programming languages such as C, C++, and Python.

Nice to have

  • PhD is a plus
  • experience with performance simulators is highly desirable

What the JD emphasized

  • 10+ years of practical experience in hardware architecture
  • proven track record in performance optimizations for deep learning training or inference systems

Other signals

  • AI server systems
  • at-scale AI workloads
  • deep learning training or inference systems