Network Developer 4

Oracle Oracle · Enterprise · Seattle, WA +1

The AI Infrastructure - Network Operations team at OCI is seeking a Principal Network Engineer to support and operate the RDMA/RoCE network fabrics for OCI's largest AI and HPC customers. This role involves designing, deploying, and operating large-scale global cloud computing environments, with a primary focus on RDMA/RoCE network fabrics and systems, utilizing automation skills. Responsibilities include leading network lifecycle management, translating architectures into detailed designs, serving as a technical lead for network projects, designing and implementing network solutions, acting as an escalation point for network incidents, and developing automation solutions. The role also involves collaborating with product teams, vendor engineering, and mentoring junior engineers.

What you'd actually do

  1. Lead network lifecycle management initiatives by defining technical objectives, delivery plans, and implementation procedures for large-scale network infrastructure projects.
  2. Translate high-level network architectures into detailed designs and deployment plans while ensuring scalability, reliability, and operational readiness.
  3. Serve as the technical lead for moderately complex network projects, coordinating the efforts of multiple engineers across design, deployment, automation, and operational support.
  4. Design, implement, and support network solutions across data center, backbone, cloud, and service provider environments.
  5. Act as a Tier 2 and specialized escalation point for network incidents, driving root cause analysis, corrective actions, and long-term reliability improvements.

Skills

Required

  • Network lifecycle management
  • Network architecture design
  • Network deployment planning
  • Technical project leadership
  • Data center networking
  • Backbone networking
  • Cloud networking
  • Service provider networking
  • Network incident resolution
  • Root cause analysis
  • Network automation
  • Operational tooling
  • Vendor management
  • Hardware evaluation
  • RFQ/RFP processes
  • Technology decision making
  • Mentoring junior engineers
  • Engineering best practices
  • Documentation standards
  • Operational excellence

Nice to have

  • Large-scale network operations
  • Network design
  • Network troubleshooting
  • Routing and switching technologies
  • BGP
  • OSPF
  • EVPN-VXLAN
  • MPLS
  • Data center networking
  • Network automation
  • Python
  • Ansible
  • APIs
  • Observability
  • Monitoring
  • Telemetry
  • Incident management
  • Cloud infrastructure
  • Hyperscale environments
  • Large-scale distributed systems
  • Technical project leadership
  • Cross-team influence
  • Written communication
  • Verbal communication

What the JD emphasized

  • RDMA/RoCE network fabrics
  • large-scale global Oracle cloud computing environment
  • operation and support of RDMA/RoCE network fabrics and systems
  • automation skills to operate a production environment
  • hundreds of thousands of network devices supporting millions of servers