What you'd actually do

Own the reliability, performance, and operability of digital websites and the infrastructure that serves AI features (inference endpoints, feature stores, model-serving pipelines).

Design, implement, and maintain observability (metrics, logs, traces, RUM) and synthetic monitoring for web and AI services to achieve target SLOs.

Drive automation: CI/CD, progressive rollout patterns, self-healing ops, and toil reduction.

Instrument and operationalize AI features: deploy/monitor models, track model performance drift, implement observability for model inputs/outputs and latency.

Mentor other engineers on reliability best practices and integrate reliability into the SDLC.

Skills

Required

Bachelor's Degree or Equivalent
6+ years of experience in site reliability, platform engineering, or DevOps with a focus on web or digital properties.
Strong understanding of web architecture and delivery: HTTP, CDNs, caching strategies, edge delivery, browsers, and rendering
Experience with full-stack web development (e.g. Front-end – HTML, CSS and JavaScript; Back-end – Python, PHP, MySQL, C#)
Experience with digital frameworks (e.g. Drupal, .NET, SharePoint, React, Angular, Vue, etc)
Experience in GenAI tools to drive GEO strategy
Practical experience with observability tooling (metrics, logging, distributed tracing) — e.g., Prometheus, Grafana, ELK/Opensearch, New Relic.
Experience designing and running CI/CD pipelines and infrastructure as code (Terraform, CloudFormation, etc.).

Nice to have

Experience managing cloud platforms and infrastructure would be a plus (e.g. AWS, GCP, Azure and PaaS offerings such as Adobe, Acquia, Platform.SH)

At Johnson & Johnson, we believe health is everything. Our strength in healthcare innovation empowers us to build a world where complex diseases are prevented, treated, and cured, where treatments are smarter and less invasive, and solutions are personal. Through our expertise in Innovative Medicine and MedTech, we are uniquely positioned to innovate across the full spectrum of healthcare solutions today to deliver the breakthroughs of tomorrow, and profoundly impact health for humanity. Learn more at jnj.com

As guided by Our Credo, Johnson & Johnson is responsible to our employees who work with us throughout the world. We provide an inclusive work environment where each person is considered as an individual. At Johnson & Johnson, we respect the diversity and dignity of our employees and recognize their merit.

**Job Function: **

Technology Product & Platform Management

**Job Sub Function: **

Reliability Engineering

Job Category:

People Leader

All Job Posting Locations:

Raritan, New Jersey, United States of America

Job Description:

Do you have a Site Reliability Engineering mindset? Worked on global healthcare products and services? Are you passionate about data? Comfortable digging into code, architecture, and operations?

Interested in applying Artificial Intelligence and Machine Learning to the way IT Applications are designed, developed, and operated? Want to help implement Continuous Release and Deployment?

We are seeking a Manager, Reliability Engineer to ensure the availability, performance, security, and scalability of our public-facing websites and AI-enabled features. This role bridges Site Reliability Engineering (SRE), web engineering, SEO best practices, and AI deployment/operationalization. You will build automation and observability, drive reliability-focused product decisions, and partner closely with engineering, product and data/ML teams. We focus on continuously optimizing existing systems, building automation into our infrastructure, applications, deployment, release and operations. Our engineers are responsible for the End to End experience of our customers and how our systems perform, using industry leading tools and approaches. Our teams improve IT Product quality by focusing on preventative and proactive measures, eliminating and automating product reliability issues, and collaborating with our product lines to deliver high quality, highly reliably commercial applications.

The ideal candidate is collaborator, innovator with superior analytical abilities. This opportunity requires excellent technical, problem-solving, and communication skills. This candidate should possess verifiable leadership qualities including being proactive, thoughtful, thorough, decisive, and flexible, conducting themselves professionally and with integrity at all times. The candidate is not a policy maker/spokesperson but drives to get the right things done. At the core of the position is delivery, automation, innovation, and relentless pursuit of highly performant products.

As a part of the Application Reliability team, you will be at the forefront of technologies interacting with the full breadth of Commercial Platforms and Technologies: Full-stack digital platforms and languages (e.g Acquia, Platform.SH, Drupal, PHP, Javascript, HTML, APIs, .NET, SharePoint and Mobile solutions), Cloud computing and architecture (e.g AWS, GCP and Azure), DevOps tools and processes (e.g Jenkins, Terraform, JIRA, xRay, JFrog and Artifactory) and Observability/Monitoring tools and practices (New Relic, Pingdom, Splunk). You will be working with passionate, driven, excited individuals who believe that providing world class products is critical to providing the best health care products in the world.

Key responsibilities

Own the reliability, performance, and operability of digital websites and the infrastructure that serves AI features (inference endpoints, feature stores, model-serving pipelines).
Design, implement, and maintain observability (metrics, logs, traces, RUM) and synthetic monitoring for web and AI services to achieve target SLOs.
Drive automation: CI/CD, progressive rollout patterns, self-healing ops, and toil reduction.
Drive transition from SEO to GEO strategy to improve user experience, visibility and adoption.
Harden production systems: capacity planning, incident response, runbooks, post-incident reviews, and remediation tracking.
Instrument and operationalize AI features: deploy/monitor models, track model performance drift, implement observability for model inputs/outputs and latency.
Build automation for repetitive operational tasks (scaling, cache invalidation, log management, backups) and self-healing workflows.
Maintain security-related reliability controls for web delivery (CDN, WAF, TLS, DDoS mitigations) in partnership with security teams.
Mentor other engineers on reliability best practices and integrate reliability into the SDLC.

Required qualifications

Bachelor's Degree or Equivalent
6+ years of experience in site reliability, platform engineering, or DevOps with a focus on web or digital properties.
Strong understanding of web architecture and delivery: HTTP, CDNs, caching strategies, edge delivery, browsers, and rendering
Experience with full-stack web development (e.g. Front-end – HTML, CSS and JavaScript; Back-end – Python, PHP, MySQL, C#)
Experience with digital frameworks (e.g. Drupal, .NET, SharePoint, React, Angular, Vue, etc)
Experience in GenAI tools to drive GEO strategy
Experience managing cloud platforms and infrastructure would be a plus (e.g. AWS, GCP, Azure and PaaS offerings such as Adobe, Acquia, Platform.SH)
Practical experience with observability tooling (metrics, logging, distributed tracing) — e.g., Prometheus, Grafana, ELK/Opensearch, New Relic.
Experience designing and running CI/CD pipelines and infrastructure as code (Terraform, CloudFormation, etc.).
Strong scripting/programming skills (Python, Go, or similar) to build automation, monitoring hooks, and integrations.
Demonstrated incident response and postmortem practice, including measurable remediation.

#JNJTech

#LI-Hybrid

Johnson & Johnson is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, age, national origin, disability, protected veteran status or other characteristics protected by federal, state or local law. We actively seek qualified candidates who are protected veterans and individuals with disabilities as defined under VEVRAA and Section 503 of the Rehabilitation Act.

Johnson & Johnson is committed to providing an interview process that is inclusive of our applicants’ needs. If you are an individual with a disability and would like to request an accommodation, please contact us via https://www.jnj.com/contact-us/careers or contact AskGS to be directed to your accommodation resource.

Required Skills:

Preferred Skills:

The anticipated base pay range for this position is :

$102,000 - $175,950

Additional Description for Pay Transparency:

Subject to the terms of their respective plans, employees and/or eligible dependents are eligible to participate in the following Company sponsored employee benefit programs: medical, dental, vision, life insurance, short- and long-term disability, business accident insurance, and group legal insurance. Subject to the terms of their respective plans, employees are eligible to participate in the Company’s consolidated retirement plan (pension) and savings plan (401(k)). Subject to the terms of their respective policies and date of hire, Employees are eligible for the following time off benefits: Vacation –120 hours per calendar year Sick time - 40 hours per calendar year; for employees who reside in the State of Washington –56 hours per calendar year Holiday pay, including Floating Holidays –13 days per calendar year Work, Personal and Family Time - up to 40 hours per calendar year Parental Leave – 480 hours within one year of the birth/adoption/foster care of a child Condolence Leave – 30 days for an immediate family member: 5 days for an extended family member Caregiver Leave – 10 days Volunteer Leave – 4 days Military Spouse Time-Off – 80 hours Additional information can be found through the link below. https://www.careers.jnj.com/employee-benefits