Site Reliability Product Owner

Boeing Boeing · Aerospace · Kent, WA

This role focuses on Site Reliability Product Ownership for a multi-application software portfolio, emphasizing release engineering, operationalizing services, AWS infrastructure, Python automation, and comprehensive monitoring. The Product Owner will manage release coordination, bug/fix lifecycles, customer approvals, incident command, and post-incident reporting. Key responsibilities include defining and implementing environment-wide monitoring, building monitoring strategies, developing dashboards, and ensuring CI/CD and release quality. The role also involves instrumenting DORA-style KPIs, automating workflows, advising on signal-processing algorithms, and coordinating with engineering and suppliers to improve reliability and delivery cadence.

What you'd actually do

  1. Oversee end-to-end release engineering and sustainment for a multi-application portfolio supporting multiple missions and effectivities.
  2. Own release control processes: scheduling, versioning, change control, approvals, and authoritative configuration/deployment records.
  3. Coordinate and compile release packages; validate release candidates through operational and enterprise testing and facilitate development activities into operational environments.
  4. Track, verify, and communicate bug/fix status across the portfolio and obtain customer and multi-level leadership sign‑offs prior to deployments.
  5. Define, implement, and maintain environment monitoring and observations across all environments, including real‑time system health, anomaly detection, and alerting to pre‑empt resource exhaustion and performance degradation.

Skills

Required

  • Bachelor’s Degree in an engineering discipline or 18 years’ directly related work experience or 22 years’ related relevant work experience
  • 20+ years of experience in software engineering, with demonstrated expertise in cloud‑native distributed systems, orchestration, and operationalizing services at scale (including serverless and containerized deployments)
  • 1+ years of experience in deploying and managing distributed systems in cloud platforms (Ex. Azure, AWS, GCP)
  • 1+ years of experience with Engineering Releases?
  • 1+ years of experience in managing product backlog, writing user stories, and managing releases
  • 1+ years of experience with cloud platforms (e.g. AWS or Azure), infrastructure as code (e.g., Terraform), and automation tools (e.g. Puppet, Ansible, Chef etc.)
  • 1+ years of experience developing and operating microservice, containerized, or serverless applications
  • 1+ years of experience with signal processing or image processing

Nice to have

  • 1+ years incident management experience, including leading post-incident reviews and preparing executive-level incident reports and slide decks.
  • 3+ years experience in Python development, scripting and automation; experience building operational tooling, and automation for deployments and incident response.

What the JD emphasized

  • AWS infrastructure
  • Python automation
  • signal-processing algorithm behavior
  • release engineering
  • operationalizing