Principal Software Engineer, AI Networking

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +1 · Remote

Principal Software Engineer to lead technical strategy for AI networking systems, focusing on customer engagements, product development, and architecture for NVIDIA's networking products like BlueField and ConnectX. Responsibilities include deep technical engagements, driving improvements, and translating customer needs into product features.

What you'd actually do

  1. Lead the technical strategy for AI Factory networking deployments at strategic customers, including conducting architecture reviews, risk assessments, and crafting multi-phase execution plans.
  2. Serve as the principal-level technical authority for embedded networking products like BlueField and ConnectX. This role also covers the surrounding technology ecosystem, including DOCA, RDMA, RoCE, and Infiniband.
  3. Lead deep technical engagements with hyperscalers and AI Factory customers, involving design-in, coding, bring-up, performance tuning, failure analysis, and production hardening.
  4. Partner with internal engineering, product, and architecture teams to transform customer needs into product features, reference architectures, tooling, and guidelines.
  5. Drive performance, reliability, and debuggability improvements across customer stacks and translate findings into actionable product, firmware, and software roadmap items.

Skills

Required

  • BS/MS/PhD in Computer Science, Computer Engineering, Electrical Engineering, or equivalent experience
  • 15+ years of relevant industry experience
  • Deep knowledge of networking protocols and distributed systems
  • strong understanding of RoCE/InfiniBand, L1–L4 fundamentals, and performance/latency tradeoffs
  • Proven low-level software expertise
  • proficiency in C/C++
  • comfort debugging across firmware, driver, and user space
  • Demonstrated experience in high-performance networking and system-level debugging
  • Excellent interpersonal skills
  • ability to clearly explain complex topics to engineers, PMs, and customer collaborators
  • align cross-organizational teams toward a decision

Nice to have

  • Prior experience in customer-facing technical leadership at hyperscalers/CSPs/AI factories (or similarly complex production environments)
  • Hands-on expertise with DPDP, DOCA, RDMA verbs, NCCL, CUDA-aware networking, congestion control, and performance tuning at scale
  • Experience building internal tools, telemetry, and automation that improve triage speed and operational excellence
  • Demonstrated innovation: patents, publications, hackathons, rapid prototyping, or shipping new architecture/features end-to-end
  • Experience leading multi-team initiatives across geo/time zones, with clear examples of influence without authority
  • eager and proactive in bringing to bear AI-powered tools to accelerate debugging, documentation, and day-to-day engineering efficiency while maintaining strong engineering judgment

What the JD emphasized

  • technical leadership across complex systems
  • Deep knowledge of networking protocols and distributed systems
  • Proven low-level software expertise
  • Demonstrated experience in high-performance networking and system-level debugging
  • customer-facing technical leadership at hyperscalers/CSPs/AI factories
  • Hands-on expertise with DPDK, DOCA, RDMA verbs, NCCL, CUDA-aware networking, congestion control, and performance tuning at scale
  • Experience leading multi-team initiatives across geo/time zones