Principal Network Engineer

Oracle Oracle · Enterprise · Seattle, WA +1

This role is for a Principal Network Engineer on the AI Infrastructure - Network Operations team at Oracle Cloud Infrastructure (OCI). The primary focus is on supporting, operating, designing, and deploying large-scale global RDMA/RoCE network fabrics that underpin OCI's AI, GPU, and HPC services. The role involves leading network lifecycle management, translating architectures into detailed designs, serving as a technical lead for network projects, and developing automation solutions to improve operational efficiency and reliability. It also includes acting as an escalation point for network incidents and collaborating with various internal and external teams.

What you'd actually do

  1. Lead network lifecycle management initiatives by defining technical objectives, delivery plans, and implementation procedures for large-scale network infrastructure projects.
  2. Translate high-level network architectures into detailed designs and deployment plans while ensuring scalability, reliability, and operational readiness.
  3. Serve as the technical lead for moderately complex network projects, coordinating the efforts of multiple engineers across design, deployment, automation, and operational support.
  4. Design, implement, and support network solutions across data center, backbone, cloud, and service provider environments.
  5. Act as a Tier 2 and specialized escalation point for network incidents, driving root cause analysis, corrective actions, and long-term reliability improvements.

Skills

Required

  • Network lifecycle management
  • Network architecture and design
  • Network operations and support
  • Data center, backbone, cloud, and service provider networking
  • Network incident management and root cause analysis
  • Network automation development
  • Collaboration with product teams and leadership
  • Vendor management and technology evaluation
  • Mentoring junior engineers
  • Routing and switching technologies (BGP, OSPF, EVPN-VXLAN, MPLS)
  • Network automation (Python, Ansible, APIs)
  • Observability, monitoring, telemetry, and incident management
  • Cloud infrastructure or hyperscale environments

Nice to have

  • Strong written and verbal communication skills

What the JD emphasized

  • RDMA/RoCE network fabrics
  • large-scale global Oracle cloud computing environment
  • large-scale network infrastructure projects
  • complex network issues and large-scale service-impacting events
  • network automation