Senior Multi‑gpu Signal Processing and System Architecture Engineer

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1 · Remote

Senior engineer to design and implement a real-time signal-processing subsystem for large-scale 5G/6G network simulation using NVIDIA's GPU platforms. This involves GPU kernel development, inter-cell data flow architecture, and synchronization across thousands of GPUs.

What you'd actually do

  1. design and implement GPU kernels that apply time‑varying, multi‑antenna channels to OFDM signals under hard real‑time deadlines
  2. architect the inter‑cell data‑flow layer — ensuring that the information each cell needs to model interference from its neighbours is compressed, transported, and consumed within the available NVLink and NIC budgets at scale
  3. work with the propagation engine and RAN stack teams to orchestrate the end‑to‑end simulation pipeline, ensuring that propagation updates, channel application, and stack execution remain synchronised across hundreds or thousands of GPUs
  4. assess design and implementation trade‑offs between physical fidelity, latency, and system scalability

Skills

Required

  • PhD in high‑performance computing, computer architecture, signal processing, or wireless communications (or equivalent experience)
  • 12+ years of proven experience
  • CUDA kernel design
  • memory hierarchy
  • register pressure
  • HBM bandwidth planning
  • production-quality GPU code
  • hard real‑time deadlines
  • data flows across multi‑device GPU systems (NVLink, NIC/RDMA)
  • bandwidth and latency accounting
  • OFDM signal processing
  • 5G NR physical layer
  • GPU-accelerated numerical workloads
  • real‑time system design

Nice to have

  • GPU-accelerated RAN platforms
  • L1/L2 software stacks
  • channel emulators
  • high‑bandwidth GPU interconnects (NVLink, NVSwitch)
  • massive MIMO beamformer design
  • MU‑MIMO precoding

What the JD emphasized

  • hard real-time deadlines
  • production-quality GPU code that meets hard real‑time deadlines
  • explicit bandwidth and latency accounting