Executive Director, AI Ops Engineering

CVS Health CVS Health · Healthcare · Work at Home, NY +51 · Innovation and Technology

Executive Director, AI Ops Engineering to build and lead a team responsible for the continuous operation, monitoring, and optimization of CVS's Enterprise AI environment. This is an engineering leadership role focused on ensuring the platform is always on, performing, and improving. The role involves establishing operational baselines, driving observability, and overseeing various functional areas including platform reliability, infrastructure, network, observability, security SRE, 24/7 operations, change management, and FinOps. It also includes leading innovation pods for AI-driven automation and self-healing capabilities, and managing the transition from a managed services provider to an internal SRE organization.

What you'd actually do

  1. Own the SRE vision, strategy, and long-range roadmap with availability (>99.99%), reliability, and scalability as the primary measures of success
  2. Lead, develop, and integrate all functional teams into a cohesive, always-on operations organization — setting clear ownership, accountability, and performance expectations for each team and each engineer
  3. Establish and enforce operational baselines across all platform components; ensure deviations are detected, escalated, and resolved within defined SLAs
  4. Drive end-to-end observability with continuous feedback loops connecting monitoring data to incident response, change decisions, and improvement cycles
  5. Oversee change management ensuring every modification is risk-assessed, monitored during rollout, and baseline-validated post-deployment

Skills

Required

  • AI Ops Engineering leadership
  • SRE principles
  • Platform reliability
  • Infrastructure management
  • Network management
  • Observability strategy
  • Security SRE
  • Incident response
  • Change management
  • FinOps
  • AI-driven automation
  • Self-healing capabilities
  • Chaos engineering
  • Resilience testing
  • GPU quota governance
  • SLO/SLI management
  • Infrastructure-as-code
  • Compliance controls
  • High-performance GPU networking
  • Security segmentation
  • Alerting pipelines
  • Vulnerability management
  • ITIL process management
  • ModelOps
  • Cost governance
  • Chargeback models
  • Vendor management
  • Managed services transition

Nice to have

  • building from the ground up

What the JD emphasized

  • availability, reliability, and scalability
  • greenfield organizational build
  • continuous operation, monitoring, and optimization
  • always on, always performing, and always improving
  • end-to-end observability
  • continuous feedback loops
  • risk-assessed
  • baseline-validated
  • world-class security posture
  • HIPAA
  • NIST AI RMF
  • self-healing and autonomous capabilities
  • GPU FinOps governance
  • structured transition of operational ownership
  • seamless transition with minimal disruption to platform availability and business operations

Other signals

  • AI platform is a critical enterprise asset
  • powering clinical, operational, and consumer capabilities at scale
  • keeping it reliable, observable, and continuously improving is the mission
  • build and lead a team of professionals responsible for the continuous operation, monitoring, and optimization of CVS's Enterprise AI environment
  • greenfield organizational build
  • define the operating model, shape the team culture, and establish the engineering standards