Network Engineer, Engineering R&d Environments

Meta Meta · Big Tech · Menlo Park, CA

Meta is looking for a Network Engineer to build and scale network infrastructure for their global engineering labs, specifically supporting AI and compute clusters. The role involves end-to-end network design, deployment, operations, troubleshooting, and automation for high-throughput, low-latency cluster networking. Experience with AI/ML compute environments and ethical AI practices is required.

What you'd actually do

  1. Own end-to-end frontend and backend network design, deployment, and operations for AI and compute lab clusters
  2. Serve as a primary networking point of contact for backend fabrics, including Arista- and internally developed network OS-based scale-out networks supporting AI workloads
  3. Design, deploy, and support high-throughput, low-latency cluster networking, including congestion management (PFC/ECN), RDMA validation, and lossless transport
  4. Perform hands-on troubleshooting and root-cause analysis across L1–L4 using packet captures, telemetry, and vendor tools to resolve complex lab issues
  5. Support silicon, hardware, and software bring-ups, ensuring reliable connectivity and on-time validation

Skills

Required

  • 8+ years of experience designing, deploying, and operating network infrastructure in production or lab environments
  • Experience working in multi-vendor environments, including network operating systems and switching platforms
  • Experience with configuration management, code repositories, and zero-touch provisioning (ZTP) for network infrastructure
  • Experience with IPv4/IPv6, L2/L3 protocols, including STP, OSPF, BGP, TCP/IP, DHCP, DNS, VLANs, VRRP, LACP, MC-LAG, ACLs, MACsec, and EVPN/VXLAN
  • Working knowledge of scripting or programming languages (e.g., Python, shell) for automation and tooling
  • Experience prioritizing competing workstreams based on impact, deadlines, and stakeholder needs in a global environment, with a track record of driving work independently while engaging cross-functional partners as needed
  • Hold networking certifications such as CCIE, JNCIE or equivalent
  • Demonstrated ongoing AI skill development (e.g., prompt/context engineering, agent orchestration) and staying current with emerging AI technologies
  • Hands-on experience with lab test equipment, optics qualification (e.g., 400G/800G), optical switches and physical infrastructure
  • Hands-on experience with backend cluster networking, including scale-out fabrics, RDMA networks, and congestion management
  • Experience adhering to and implementing responsible, ethical AI practices (e.g., risk assessment, bias mitigation, quality and accuracy review)
  • Hands-on experience with disaggregated networking products and software, such as Meta's open network OS (Meta's open network OS), SONiC, Cumulus Linux, or equivalent open networking platforms
  • Demonstrated ability to integrate AI tools to optimize/redesign workflows and drive measurable impact (e.g., efficiency gains, quality improvements)
  • Experience adhering to and implementing responsible, ethical AI practices (e.g., risk assessment, bias mitigation, quality and accuracy reviews)
  • Experience supporting AI/ML or high-performance compute clusters in lab or pre-production environments
  • Experience with network automation, CI/CD pipelines, audit frameworks, and validation tooling
  • Understanding of physical infrastructure design, including structured cabling, space, power, and cooling systems
  • Networking L1 expertise in validating multi-vendor optics, with proficiency using the switch ASIC diagnostic shell (e.g., Memory commands) and I2C utilities to troubleshoot hardware-level issues

Nice to have

  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience

What the JD emphasized

  • AI workloads
  • AI and compute lab clusters
  • high-throughput, low-latency cluster networking
  • AI/ML or high-performance compute clusters
  • responsible, ethical AI practices
  • integrate AI tools to optimize/redesign workflows