Senior Event Correlation & App Telemetry Specialist

Merck Merck · Pharma · Telangana, India

This role focuses on designing, implementing, and optimizing enterprise observability solutions using AIOps, modern observability practices, and large-scale monitoring platforms. The goal is to improve system reliability, operational intelligence, and incident response by leveraging metrics, logs, traces, and event correlation, with a specific emphasis on anomaly detection, event correlation, and predictive alerting to reduce alert noise and improve signal quality.

What you'd actually do

  1. Design and implement modern observability architecture across cloud, hybrid, and on-premises environments.
  2. Implement and maintain enterprise monitoring platforms.
  3. Implement AIOps-driven capabilities including: anomaly detection, event correlation, root cause analysis, predictive alerting
  4. Define and implement SLIs, SLOs, and alerting strategies.
  5. Integrate monitoring with CI/CD pipelines, cloud platforms, and ITSM systems.

Skills

Required

  • Observability Strategy & Architecture
  • Monitoring Platform Engineering
  • AIOps & Intelligent Monitoring
  • Incident & Reliability Management
  • Platform Integration
  • Metrics, logs, traces, and telemetry collection
  • Cloud platforms and distributed systems
  • Agile/Scrum environment
  • JIRA and Confluence
  • Python, Go, or similar scripting/programming skills
  • Written and verbal communication skills
  • Leadership skills

Nice to have

  • Monitoring containerized environments using Kubernetes
  • Implementing distributed tracing
  • Machine learning techniques for operations
  • Integrating monitoring tools with 3rd Party Tools & ITSM platforms such as ServiceNow
  • Supporting large-scale production environments
  • GenAI & Agentic AI concepts

What the JD emphasized

  • Minimum of 8+ years of hands-on experience and 4+ years of experience in a Product Manager role, preferably working on a technical product, developer tool, or internal Observability platform
  • Hands-on experience with monitoring and observability tools such as: Grafana, Open Telemetry, Moogsoft, BigPanda, Dynatrace, xMatters, Prometheus, LogicMonitor
  • Experience implementing AIOps platforms or automation for operations
  • Understanding concepts of GenAI & Agentic AI and ability to create new strategies where to reasonably implement it.

Other signals

  • AIOps
  • anomaly detection
  • event correlation
  • predictive alerting
  • reduce alert noise
  • machine learning or statistical techniques