Senior Manager, Rdma Fabric Design & Engineering

Oracle Oracle · Enterprise · Austin, TX +1

Leads the design, architecture, engineering, and operational strategy for large-scale RDMA backend fabrics supporting AI, HPC, and cloud infrastructure. This role is responsible for building and scaling high-performance, low-latency network fabrics, driving end-to-end network architecture decisions, and leading teams responsible for fabric design, validation, deployment, automation, reliability, and lifecycle management. The Senior Manager partners with silicon vendors, platform engineering, systems architecture, capacity planning, and cloud infrastructure teams to deliver scalable, resilient, and future-ready networking solutions.

What you'd actually do

  1. Lead architecture and design of large-scale RDMA fabrics supporting AI training, inference, HPC, and cloud workloads.
  2. Define network topology, routing strategy, congestion management, resiliency, and capacity models for multi-cluster deployments.
  3. Drive technology evaluation and roadmap decisions across Ethernet, RoCE, InfiniBand, optical networking, and emerging fabric technologies.
  4. Oversee lab validation, scale testing, performance benchmarking, and failure scenario analysis.
  5. Build and lead a high-performing team of network architects and engineers.

Skills

Required

  • networking
  • distributed systems
  • cloud infrastructure
  • network architecture
  • Layer 2/Layer 3 networking
  • routing protocols
  • data center networking
  • people leadership

Nice to have

  • RDMA technologies
  • RoCEv2
  • InfiniBand
  • hyperscale cloud
  • congestion control
  • ECMP
  • traffic engineering
  • QoS
  • lossless Ethernet
  • network automation
  • Python
  • Go
  • Ansible
  • telemetry platforms
  • network modeling
  • capacity planning
  • public cloud providers

What the JD emphasized

  • large-scale RDMA backend fabrics
  • AI training
  • high-performance
  • low-latency network fabrics
  • network architecture
  • large-scale backend fabrics
  • AI/HPC network infrastructure