Senior Network Infrastructure Engineer

NVIDIA NVIDIA · Semiconductors · India +4 · Remote

NVIDIA is seeking a Senior Network Infrastructure Engineer to support and maintain their cloud network infrastructure, which serves various needs including Autonomous Vehicles and Artificial Intelligence. The role involves remediating alerts, triaging incidents, interacting with customers and vendors, and participating in network upgrades and capacity augmentations. Key responsibilities include 24/7 shift rotations, managing large-scale IP networks, and supporting on-premises and cloud infrastructures. Required skills include deep knowledge of various network protocols (TCP/IP, BGP, OSPF, MPLS, etc.), over 4 years of network operations experience, troubleshooting, incident management, and familiarity with cloud environments (AWS, Azure, GCP, OCI) and vendors like Arista, Fortinet, and Juniper. Experience with tooling and automation for network management is also expected.

What you'd actually do

  1. Engage in 24/7 global shift rotations to provide remote support for network repairs and changes while collaborating across teams and updating customers on status and ticket information.
  2. Drive operational improvements in change management and daily operations by following procedures.
  3. Manage and operate large scale IP network technologies and infrastructures.
  4. Utilise your skills in Peering and Datacenter interconnect technologies: PNI, Transit, Exchange, Passive DWDM, Wave circuits.
  5. Monitor and support the network health of on-premises and cloud infrastructures.

Skills

Required

  • TCP/IP
  • BGP
  • OSPF
  • MPLS
  • IS-IS
  • VxLAN
  • EVPN
  • QoS
  • GRE
  • IPsec
  • DNS
  • MACsec
  • network operations
  • network troubleshooting
  • alert response
  • incident management
  • AWS
  • Azure
  • GCP
  • OCI
  • Arista
  • Fortinet
  • Juniper
  • tooling and automation for provisioning, monitoring, and managing complex network infrastructures
  • Bachelor’s degree in Computer Science, related technical field, or equivalent experience
  • Excellent verbal and written communication skills

Nice to have

  • Mellanox/Cumulus OS
  • Infiniband technology
  • Unix/Linux system administration
  • Python scripting
  • Shell scripting
  • Netbox/Nautobot
  • Prometheus
  • Grafana
  • Panoptes

What the JD emphasized

  • critical alerts within defined SLAs
  • triage production impacting network incidents
  • network device upgrades and capacity augmentations
  • alert monitoring & resolution in large-scale networks and CSP environments
  • outstanding troubleshooting skills
  • understanding of L3 underlay networks
  • network protocol knowledge in large multi-vendor infrastructures
  • Deep knowledge and experience of TCP/IP, BGP, OSPF, MPLS, IS-IS, VxLAN, EVPN, QoS, GRE, IPsec, DNS, and MACsec
  • Over 4 years of experience in network operations
  • Skilled in network troubleshooting techniques and leveraging creative problem-solving abilities
  • Strong track record of alert response within defined SLAs and Incident management