Staff Reliability Engineer

Merck Merck · Pharma · Central Bohemian, Czech Republic

Staff Reliability Engineer role focused on implementing and operationalizing reliability practices, ensuring systems are designed, built, and operated with reliability in mind. This role partners with engineering teams to embed reliability into system design, development, and operations, supports SLO implementation, and improves observability coverage. It also involves developing automation for incident response and leveraging AI-enabled capabilities where appropriate.

What you'd actually do

  1. Partner with application and platform teams to embed reliability into system design, development, and operations
  2. Support implementation and operationalization of Service Level Objectives and reliability indicators
  3. Contribute to improving observability coverage across logs, metrics, traces, and events
  4. Apply reliability patterns such as fault isolation, failover, and recovery mechanisms in collaboration with engineering teams
  5. Participate in and support improvements to the incident lifecycle, including detection, response, root cause analysis, and follow-up actions

Skills

Required

  • system integration
  • software development
  • system administration
  • operations engineering
  • software development life cycle (SDLC)
  • production support models
  • monitoring
  • observability
  • performance optimization
  • cloud environments
  • on-premises environments
  • CI/CD pipelines
  • deployment practices
  • incident management
  • root cause analysis processes
  • system reliability principles
  • availability
  • performance engineering
  • problem-solving
  • continuous improvement
  • collaboration

Nice to have

  • observability platforms
  • reliability tooling ecosystems
  • Service Level Objectives
  • reliability metrics frameworks
  • automation
  • scripting
  • Python
  • Bash
  • resilience patterns
  • distributed systems concepts
  • AI-assisted operational tools and workflows

What the JD emphasized

  • reliability in mind
  • reliability practices
  • reliability maturity
  • reliability engineering
  • reliability standards
  • reliability improvements
  • reliability frameworks
  • reliability tooling