Lead, Medtech Technology Service Reliability Engineer, R&d

Johnson & Johnson Johnson & Johnson · Pharma · Raritan, NJ +1

This role focuses on designing, building, and operating reliability practices for critical engineering and enterprise services, ensuring availability, performance, security, and resilience. It involves hands-on work in observability, incident response, automation, and engineering excellence within a regulated environment. The SRE partners with various teams to define reliability targets, implement operational controls, and maintain documentation.

What you'd actually do

  1. Define, implement, and continuously improve reliability standards for production services, including SLIs/SLOs, error budgets, and operational readiness criteria.
  2. Build and maintain observability capabilities (metrics, logs, traces, dashboards) and establish actionable alerts that reflect customer impact.
  3. Participate in on-call rotations, lead incident triage and restoration, and drive root-cause analysis with corrective and preventive actions.
  4. Engineer reliability improvements through automation (self-healing, auto-remediation, runbook automation) and eliminate toil through scripting and tooling.
  5. Partner with engineering teams to design and validate resilient architectures (timeouts/retries, circuit breaking, queuing, graceful degradation) and to improve deployment safety.

Skills

Required

  • SRE
  • DevOps
  • platform engineering
  • software engineering
  • production operations
  • observability
  • incident management
  • monitoring/alerting design
  • on-call operations
  • root-cause analysis
  • infrastructure-as-code
  • CI/CD
  • automated testing/release practices
  • cloud-hosted or hybrid enterprise environments
  • networking fundamentals
  • secure configuration
  • environment management
  • communication skills
  • Agile delivery practices
  • cross-functional collaboration

What the JD emphasized

  • regulated environments
  • reliability standards
  • operational controls
  • reliability risks
  • resilient designs
  • operational tooling
  • operational runbooks
  • resilience testing
  • deployment safety
  • operational documentation
  • audit readiness
  • controlled releases
  • security-by-design principles
  • maintainability and supportability