Site Reliability Engineer IV

Premera Blue Cross Premera Blue Cross · Insurance · Mountlake Terrace, WA

Site Reliability Engineer IV focused on driving reliability and operational excellence across cloud, on-premise, and hybrid platforms. The role involves building scalable automation and AI-powered tooling to improve system health, reduce manual effort, and accelerate incident response. Key responsibilities include developing AI-powered tooling for anomaly detection, predictive alerting, and LLM-assisted diagnostics, designing end-to-end observability, and leading cross-team efforts for reliability improvements. The role emphasizes influencing DevOps practices and applying emerging technologies in AI/ML to operational efficiency.

What you'd actually do

  1. Build, run, and optimize critical services across cloud, on-premise, and hybrid environments, including managed services, custom applications, and third-party integrations
  2. Develop automation and AI-powered tooling to reduce manual intervention, including anomaly detection, predictive alerting, and LLM-assisted diagnostics that surface actionable insights
  3. Design and implement end-to-end observability, telemetry, and self-healing capabilities across platforms
  4. Lead cross-team efforts to drive root cause analysis, post-incident reviews, and long-term reliability improvements
  5. Define and drive reliability strategy, standards, and best practices across engineering teams

Skills

Required

  • 7+ years of experience in Site Reliability Engineering, DevOps, or IT Operations within complex environments
  • Demonstrated experience leveraging AI platforms and tooling to design and build automation solutions
  • Advanced troubleshooting across distributed systems and applications
  • Proficiency in one or more programming languages such as Python, Java, C#, JavaScript, or PowerShell
  • Ability to debug complex systems and guide teams through technical problem-solving
  • Strong collaboration and communication skills across engineering teams

Nice to have

  • Hands on experience applying AI/ML to operational workflows, including anomaly detection, predictive alerting, or intelligent automation at scale
  • Advanced experience with Kubernetes, Docker, and container-based platforms
  • Deep expertise with event streaming platforms
  • Experience working across cloud, on-premise, and hybrid environments
  • Experience working in large-scale, regulated enterprise environments
  • Familiarity with AI/ML concepts and integrating intelligent automation into operational workflows

What the JD emphasized

  • AI-powered tooling
  • anomaly detection
  • predictive alerting
  • LLM-assisted diagnostics
  • intelligent automation
  • reliability strategy
  • DevOps practices
  • AI/ML

Other signals

  • AI-powered tooling
  • anomaly detection
  • predictive alerting
  • LLM-assisted diagnostics
  • intelligent automation