Senior Software Architect, AI Networking

NVIDIA · Semiconductors · Tel Aviv, Israel

Senior Software Architect role focused on designing and optimizing large-scale LLM inference infrastructure on GPU clusters, involving system-level optimizations for latency, throughput, and cost-efficiency.

What you'd actually do

  1. Design and evolve scalable architectures for multi-node LLM inference across GPU clusters.
  2. Develop infrastructure to optimize latency, throughput, and cost-efficiency of serving large models in production.
  3. Collaborate with model, systems, compiler, and networking teams to ensure holistic, high-performance solutions.
  4. Prototype novel approaches to KV cache handling, tensor/pipeline parallel execution, and dynamic batching.
  5. Evaluate and integrate new software and hardware technologies relevant to model inference (e.g., memory hierarchy, network topology, modern inference architectures).

Skills

Required

  • C++
  • Python
  • CUDA
  • Distributed Systems
  • Performance Optimization
  • Deep Learning Systems
  • GPU Acceleration
  • System-level Thinking

Nice to have

  • LLM inference pipelines
  • Transformer model optimization
  • Model-parallel deployments
  • Profiling and performance optimization
  • Data center orchestration
  • Cluster schedulers
  • AI service deployment pipelines

What the JD emphasized

  • 5+ years of experience building large-scale distributed systems or performance-critical software
  • Deep understanding of deep learning systems, GPU acceleration, and AI model execution flows
  • Solid software engineering skills in C++ and/or Python, with strong familiarity with CUDA or similar platforms
  • Strong system-level thinking across memory, networking, scheduling, and compute orchestration

Other signals

  • LLM inference at scale
  • GPU clusters
  • system-level optimizations