Principal Network Engineer - AI Infrastructure

CVS Health · Healthcare · Albany, NY +52 · Innovation and Technology

The Principal Network Engineer - AI Infrastructure is responsible for designing, implementing, and optimizing high-performance network infrastructure for AI and GPU-driven workloads, including data center networks, leaf-spine fabrics, and EVPN/VXLAN. This role involves collaborating with compute, storage, and security teams to support large-scale training and inference platforms, ensuring network performance, scalability, and security. The position also involves strategic planning, mentorship, and evaluating emerging AI infrastructure technologies.

What you'd actually do

Design and implement high-performance data center networks optimized for AI/GPU workloads, including leaf‑spine and EVPN/VXLAN fabrics.
Integrate networking with GPU clusters and high-performance storage systems supporting training and inference workloads.
Optimize network performance (latency, throughput, congestion) for large-scale distributed environments.
Define and drive the data center network strategy supporting AI/ML platforms and business initiatives.
Partner with compute, storage, platform, and security teams to design integrated AI infrastructure solutions.

Skills

Required

10+ years of experience in network engineering, with at least 5+ years in a leadership, architectural, or lead engineering role delivering enterprise or cloud network initiatives end-to-end.
5+ years of experience designing and operating large-scale data center networks, including Layer 2/3 architectures (leaf-spine/Clos), EVPN/VXLAN overlays, and high-speed networking (100/200/400Gb+).
5+ years of experience with enterprise routing, switching, and network platforms, including Cisco-centric data center fabrics, protocols (BGP, OSPF, MPLS, STP), and hybrid connectivity (SD-WAN, VPN, remote access).
5+ years of experience implementing network security technologies, including Palo Alto Networks firewalls (required), NGFW, IDS/IPS, ZTNA, DLP, and micro-segmentation, with understanding of application-aware and zero trust architectures.
3+ years of experience supporting AI/ML or GPU-based environments, including NVIDIA reference architectures and performance-optimized networking for distributed training workloads (e.g., traffic flow optimization, congestion management).
3+ years of experience with application delivery and observability technologies, including F5 load balancing, network performance monitoring tools (e.g., NetFlow, Wireshark, SolarWinds), and traffic analysis for performance tuning.

Nice to have

Experience designing and supporting AI factory / GPU cluster environments at scale (training and inference platforms).
Familiarity with high-performance compute networking enhancements (RDMA over Converged Ethernet – RoCE, PFC, ECN).
Experience with Cisco Nexus, ACI, or equivalent data center switching platforms supporting AI workloads.
Strong technical expertise with Networking and Software-Defined Networking (SDN) principles.
Strong technical expertise with developing and interpreting Network, Sequence, and Dataflow diagrams.
Understanding of at least one compliance framework (HIPAA, HITRUST, PCI, NIST, CSA).
Strong technical expertise in defining and implementing cyber resilience standards, policies, and programs for distributed cloud and network infrastructure, ensuring robust redundancy and system reliability.
Experience in influencing industry standards and contributing to open-source projects or security communities, highlighting a broader impact beyond the immediate organizations.

What the JD emphasized

5+ years of experience supporting AI/ML or GPU-based environments, including NVIDIA reference architectures and performance-optimized networking for distributed training workloads (e.g., traffic flow optimization, congestion management).
5+ years of experience implementing network security technologies, including Palo Alto Networks firewalls (required), NGFW, IDS/IPS, ZTNA, DLP, and micro-segmentation, with understanding of application-aware and zero trust architectures.

Other signals

designing and delivering scalable data center solutions that support large-scale training and inference platforms
high-performance network infrastructure that powers the organization’s AI and GPU-driven workloads
integrating networking with GPU clusters and high-performance storage systems supporting training and inference workloads

Apply on company site

● Active

Posted today · 0 days open

AI score: 7/10
Stage: Serve Agent
Location: Albany, NYNew York, NYUnited StatesJuneau, AKMontgomery, ALLittle Rock, ARPhoenix, AZSacramento, CADenver, COHartford, CTWashington Dc, DCDover, DETallahassee, FLAtlanta, GAHonolulu, HIDes Moines, IABoise, IDSpringfield, ILIndianapolis, INTopeka, KSFrankfort, KYBaton Rouge, LABoston, MAAnnapolis, MDAugusta, MELansing, MISaint Paul, MNJefferson City, MOJackson, MSHelena, MTRaleigh, NCBismarck, NDLincoln, NEConcord, NHTrenton, NJSanta Fe, NMCarson City, NVColumbus, OHOklahoma City, OKSalem, ORHarrisburg, PAProvidence, RIColumbia, SCPierre, SDNashville, TNAustin, TXSalt Lake City, UTRichmond, VAMontpelier, VTOlympia, WAMadison, WICharleston, WVCheyenne, WY
Role: Principal · Infra
Function: Engineering
Domain: general
Maturity: Building

Tech tags

Inference infra Model serving

Read full job description

We’re building a world of health around every individual — shaping a more connected, convenient and compassionate health experience. At CVS Health®, you’ll be surrounded by passionate colleagues who care deeply, innovate with purpose, hold ourselves accountable and prioritize safety and quality in everything we do. Join us and be part of something bigger – helping to simplify health care one person, one family and one community at a time.

Position Summary

The Principal Network Engineer – AI Infrastructure plays a key role in building the high‑performance network infrastructure that powers the organization’s AI and GPU‑driven workloads. This position is responsible for designing and delivering scalable data center solutions that support large‑scale training and inference platforms. By leveraging modern architectures such as leaf‑spine fabrics, and aligning with leading vendor and industry reference designs, the role helps enable reliable, high‑throughput environments that directly support critical business initiatives.

Working closely with engineering, platform, and security partners, this role helps connect network, compute, and security capabilities into a cohesive, high‑performing ecosystem. In addition to hands‑on technical contribution, the position provides guidance on best practices, supports the development of other engineers, and helps shape the future direction of the organization’s AI infrastructure. Through continuous improvement, thoughtful design, and a focus on performance and resilience, this role contributes to a secure and scalable foundation that supports long‑term growth and innovation.

Role Responsibilities:

Collaboration & Expertise

Partner with compute, storage, platform, and security teams to design integrated AI infrastructure solutions.
Serve as a senior technical authority aligning network designs with NVIDIA, Cisco, and industry reference architecture.
Influence enterprise network and security strategy through collaboration with engineering leadership and stakeholders.

Analysis & Configuration

Design and implement high-performance data center networks optimized for AI/GPU workloads, including leaf‑spine and EVPN/VXLAN fabrics.
Integrate networking with GPU clusters and high-performance storage systems supporting training and inference workloads.
Optimize network performance (latency, throughput, congestion) for large-scale distributed environments.
Evaluate and deploy advanced networking technologies to improve scalability, reliability, and security.

Operational Support

Support 24/7 infrastructure operations, including on-call responsibilities across cloud, on-prem, and colocation environments.
Lead incident response and resolution for network-related issues, driving root cause analysis and resilience improvements.

Mentorship and Training

Mentor and develop engineers, promoting best practices in networking and security.
Support knowledge sharing through training sessions and technical enablement.

Innovation and Research

Evaluate and adopt emerging AI infrastructure and networking technologies (e.g., high-speed interconnects, next gen switching).
Contribute to research, innovation, and continuous improvement of network and security capabilities.

Strategic Planning

Define and drive the data center network strategy supporting AI/ML platforms and business initiatives.
Establish standards and reference architecture aligned with industry best practices.
Guide long-term roadmap decisions, balancing performance, scalability, security, and risk.

Required Qualifications

10+ years of experience in network engineering, with at least 5+ years in a leadership, architectural, or lead engineering role delivering enterprise or cloud network initiatives end-to-end.
5+ years of experience designing and operating large-scale data center networks, including Layer 2/3 architectures (leaf-spine/Clos), EVPN/VXLAN overlays, and high-speed networking (100/200/400Gb+).
5+ years of experience with enterprise routing, switching, and network platforms, including Cisco-centric data center fabrics, protocols (BGP, OSPF, MPLS, STP), and hybrid connectivity (SD-WAN, VPN, remote access).
5+ years of experience implementing network security technologies, including Palo Alto Networks firewalls (required), NGFW, IDS/IPS, ZTNA, DLP, and micro-segmentation, with understanding of application-aware and zero trust architectures.
3+ years of experience supporting AI/ML or GPU-based environments, including NVIDIA reference architectures and performance-optimized networking for distributed training workloads (e.g., traffic flow optimization, congestion management).
3+ years of experience with application delivery and observability technologies, including F5 load balancing, network performance monitoring tools (e.g., NetFlow, Wireshark, SolarWinds), and traffic analysis for performance tuning.

Preferred Qualifications

Experience designing and supporting AI factory / GPU cluster environments at scale (training and inference platforms).
Familiarity with high-performance compute networking enhancements (RDMA over Converged Ethernet – RoCE, PFC, ECN).
Experience with Cisco Nexus, ACI, or equivalent data center switching platforms supporting AI workloads.
Strong technical expertise with Networking and Software-Defined Networking (SDN) principles.
Strong technical expertise with developing and interpreting Network, Sequence, and Dataflow diagrams.
Understanding of at least one compliance framework (HIPAA, HITRUST, PCI, NIST, CSA).
Strong technical expertise in defining and implementing cyber resilience standards, policies, and programs for distributed cloud and network infrastructure, ensuring robust redundancy and system reliability.
Experience in influencing industry standards and contributing to open-source projects or security communities, highlighting a broader impact beyond the immediate organizations.
Experience with network automation and Infrastructure as Code
Background in high-availability and disaster recovery design
Certifications: CCIE/CCNP, JNCIE, AWS/Azure/GCP Networking, PCNSE/PAN or Security Specialty, CISSP

Education

Bachelor’s degree or equivalent experience (High School Diploma and 4 years relevant experience)

Pay Range

The typical pay range for this role is:

$144,200.00 - $288,400.00

This pay range represents the base hourly rate or base annual full-time salary for all positions in the job grade within which this position falls. The actual base salary offer will depend on a variety of factors including experience, education, geography and other relevant factors. This position is eligible for a CVS Health bonus, commission or short-term incentive program in addition to the base pay range listed above. This position also includes an award target in the company’s equity award program.

Our people fuel our future. Our teams reflect the customers, patients, members and communities we serve and we are committed to fostering a workplace where every colleague feels valued and that they belong.

Great benefits for great people

We take pride in offering a comprehensive and competitive mix of pay and benefits that reflects our commitment to our colleagues and their families.

This full‑time position is eligible for a comprehensive benefits package designed to support the physical, emotional, and financial well‑being of colleagues and their families. The benefits for this position include medical, dental, and vision coverage, paid time off, retirement savings options, wellness programs, and other resources, based on eligibility.

Additional details about available benefits are provided during the application process and on Benefits Moments.

We anticipate the application window for this opening will close on: 06/18/2026

Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state and local laws.