Manager, Reliability Engineering

Johnson & Johnson Johnson & Johnson · Pharma · Raritan, NJ +1

Manager, Reliability Engineer role at Johnson & Johnson focused on ensuring the availability, performance, security, and scalability of public-facing websites and AI-enabled features. This role combines Site Reliability Engineering (SRE), web engineering, SEO, and AI deployment/operationalization, emphasizing automation, observability, and product reliability. The candidate will own the reliability of digital websites and the infrastructure serving AI features, instrument and operationalize AI features, and mentor engineers on reliability best practices.

What you'd actually do

  1. Own the reliability, performance, and operability of digital websites and the infrastructure that serves AI features (inference endpoints, feature stores, model-serving pipelines).
  2. Design, implement, and maintain observability (metrics, logs, traces, RUM) and synthetic monitoring for web and AI services to achieve target SLOs.
  3. Drive automation: CI/CD, progressive rollout patterns, self-healing ops, and toil reduction.
  4. Instrument and operationalize AI features: deploy/monitor models, track model performance drift, implement observability for model inputs/outputs and latency.
  5. Mentor other engineers on reliability best practices and integrate reliability into the SDLC.

Skills

Required

  • Bachelor's Degree or Equivalent
  • 6+ years of experience in site reliability, platform engineering, or DevOps with a focus on web or digital properties.
  • Strong understanding of web architecture and delivery: HTTP, CDNs, caching strategies, edge delivery, browsers, and rendering
  • Experience with full-stack web development (e.g. Front-end – HTML, CSS and JavaScript; Back-end – Python, PHP, MySQL, C#)
  • Experience with digital frameworks (e.g. Drupal, .NET, SharePoint, React, Angular, Vue, etc)
  • Experience in GenAI tools to drive GEO strategy
  • Practical experience with observability tooling (metrics, logging, distributed tracing) — e.g., Prometheus, Grafana, ELK/Opensearch, New Relic.
  • Experience designing and running CI/CD pipelines and infrastructure as code (Terraform, CloudFormation, etc.).

Nice to have

  • Experience managing cloud platforms and infrastructure would be a plus (e.g. AWS, GCP, Azure and PaaS offerings such as Adobe, Acquia, Platform.SH)

What the JD emphasized

  • AI-enabled features
  • AI deployment/operationalization
  • model performance drift
  • observability for model inputs/outputs and latency

Other signals

  • AI-enabled features
  • AI deployment/operationalization
  • model performance drift
  • observability for model inputs/outputs and latency