Network Engineer II

Microsoft Microsoft · Big Tech · United States · Cloud Network Engineering

This role focuses on designing, building, and managing cutting-edge networking infrastructure for large-scale AI training and inference in Azure Cloud. The Network Engineer will ensure high performance, low latency, and minimal jitter for distributed AI workloads, working with diverse network architectures and processor/accelerator technologies to deliver a comprehensive, end-to-end solution with a focus on performance, scalability, and observability.

What you'd actually do

  1. Collaborates with teams across the organization to support and manage safe and secure network deployments.
  2. Works with machine-readable definitions to manage deployments.
  3. Supports the management of incidents by applying technical knowledge to diagnose and triage issues with a commitment to maintaining the quality of products and services. Takes notes during incidents and participates in postmortem and root cause analysis processes.
  4. Performs testing and validation of network devices, firmware, and configurations. Defines and implements test cases with existing automation tools, and exposes test coverage gaps.
  5. Triages, troubleshoots, and repairs live site issues by applying an understanding of network components and features (e.g., device operating systems) as well as problem management tools (e.g., root cause analysis, trend analysis, postmortems), to discover and drive solutions with minimal or no disruption to customers. Actively participates in on-call/DRI duties to troubleshoot and may actively resolve incidents in production.
  6. Monitors network telemetry and performs analyses to identify patterns that reveal errors and unexpected problems. Makes suggestions on improvements to monitoring based on observations and experience.
  7. Provides instructions to datacenter or network site staff/technicians on how to securely repair, replace, and maintain physical network hardware and components deployed in production. Identifies gaps and inefficiencies in processes related to securely installing and deploying new hardware and components and provides instructions to address gaps.

Skills

Required

  • Network design
  • network development
  • network automation
  • troubleshooting
  • incident management
  • testing and validation
  • monitoring
  • technical experience

Nice to have

  • Data analysis
  • machine-readable definitions
  • datacenter operations
  • physical network hardware maintenance

What the JD emphasized

  • high performance
  • low latency
  • minimal jitter
  • large-scale
  • low-latency systems
  • performance
  • scalability
  • observability