Senior Software Engineer - Dgx Cloud Services and Software

NVIDIA NVIDIA · Semiconductors · Santa Clara, CA +4 · Remote

Senior Software Engineer role focused on building and managing NVIDIA's cloud infrastructure and networking equipment, involving design, development, testing, and operational support of scalable software systems. The role requires strong experience in distributed systems, programming languages like C++, Golang, or Rust, and SRE principles.

What you'd actually do

  1. Work with NVIDIA internal customers
  2. Design and build scalable software systems to manage NVIDIA’s cloud infrastructure.
  3. Participate in responses to real-time operational events
  4. Building network and systems automation software for managing a multi-tenant cloud infrastructure
  5. Participate in open-source communities of software we leverage and build.

Skills

Required

  • distributed software systems
  • C, C++, Golang, or Rust
  • API design
  • asynchronous programming
  • type safety
  • threading models
  • state machines
  • data structures
  • SQL
  • secure communication protocols
  • SRE principles

Nice to have

  • Hyperscale Cloud Service Provider experience
  • networking protocols (IP, IPv6, BGP, HTTP, ICMP, tunneling protocols)
  • Infiniband networking
  • Host management systems (DHCP, Redfish, UEFI)
  • host security services (TPM, TXT, Secure-Boot)
  • Kubernetes
  • distributed task scheduling

What the JD emphasized

  • 8+ years of experience with designing and building distributed software systems.
  • Track record of directly supporting systems with external customers, or demanding internal customers
  • BS/MS degree in Computer science or related areas (or equivalent experience)
  • Demonstrated ability to write code in a mainstream systems programming language such as C, C++, Golang, or Rust.
  • Demonstrated ability to design and implement maintainable APIs for consumers.
  • Practical experience with asynchronous programming, type safety, threading models, state machines and data structures.
  • Background of data persistence (SQL or similar).
  • Understanding of secure communication protocols (mutual-TLS, IPsec, or similar).
  • Knowledge of SRE principles (observability, SLOs, logging, etc.)