What you'd actually do

Create, document, and champion a unified, long-term observability strategy for the entire company, covering all telemetry types like metrics, traces, logs, profiles and more

Design a cohesive, full-stack observability solution using best-in-class tools and practices. Ensure our architecture promotes a product-oriented approach with a strong focus on self-service capabilities

Actively identify and help decommission redundant or overlapping tools, driving the organization towards a standardized, cost-effective, and easy-to-manage observability service

Partner with Site Reliability Engineering (SRE) team, promote their principles, such as defining, measuring, and managing Service Level Objectives (SLOs) and error budgets and how critical Observability for them

Clearly and persuasively communicate complex observability topics, benefits, and strategic decisions to senior leadership and diverse stakeholders across IT and business domains

Skills

Required

Observability
software engineering
SRE
platform engineering
architecting and implementing large-scale technical solutions
Grafana
Prometheus
Dynatrace
BigPanda
xMatters
technical strategy
Site Reliability Engineering principles
SLOs
error budgets
OpenTelemetry
Multi-Cloud Experience (AWS, Azure, GCP)
Communication
Influence

Nice to have

Tool Rationalization
FinOps Knowledge
eBPF Knowledge
Product Mindset

What the JD emphasized

10+ years in Observability, software engineering, SRE, or platform engineering, with a proven track record of architecting and implementing large-scale technical solutions

Subject matter expert on Observability practices and collection of all types of telemetry. SME for the tools that support them (e.g., Grafana, Prometheus, Dynatrace, BigPanda, xMatters or others)

Demonstrable experience creating and executing a technical strategy across multiple teams or an entire organization

Deep, practical knowledge of Site Reliability Engineering principles, with hands-on experience defining, implementing, and managing SLOs and error budgets

Strong understanding and practical experience with the OpenTelemetry standard for instrumentation and telemetry collection

Exceptional Communication & Influence: World-class ability to explain highly complex technical concepts to senior executives and non-technical audiences. Proven ability to lead by influence and drive consensus in a large organization

Job Description

We are seeking a visionary and influential technical leader to join as an Enterprise Observability Architect. In this role, you will be the chief architect and evangelist for our company-wide observability strategy. Your mission is to define and drive the adoption of a unified, full-stack observability framework that empowers our product teams to build and operate highly reliable, performant, and resilient services.

This is not just a technical role; it is a strategic one. You will work across organizational boundaries, from engineering teams to senior leadership, to champion a culture of operational excellence. You will be responsible for rationalizing our tooling landscape, eliminating redundancy, and designing a cohesive, self-service observability platform that is easy to consume. If you are passionate about turning complex technical concepts into clear business value and leading large-scale change, this is your opportunity to make a lasting impact.

What You'll Do

Create Strategy: Create, document, and champion a unified, long-term observability strategy for the entire company, covering all telemetry types like metrics, traces, logs, profiles and more
Architect the Future: Design a cohesive, full-stack observability solution using best-in-class tools and practices. Ensure our architecture promotes a product-oriented approach with a strong focus on self-service capabilities
Drive Standardization and Consolidation: Actively identify and help decommission redundant or overlapping tools, driving the organization towards a standardized, cost-effective, and easy-to-manage observability service
Join forces with SRE team: Partner with Site Reliability Engineering (SRE) team, promote their principles, such as defining, measuring, and managing Service Level Objectives (SLOs) and error budgets and how critical Observability for them
Evangelize and Educate: Clearly and persuasively communicate complex observability topics, benefits, and strategic decisions to senior leadership and diverse stakeholders across IT and business domains
Lead by Influence: Act as the senior technical expert and thought leader in the observability space. Mentor engineers and guide teams in implementing best practices for instrumentation (OpenTelemetry), monitoring, and incident analysis using AIOps
Stay on the Cutting Edge: Continuously research industry trends, emerging technologies (like eBPF, OpenTelemetry), and best practices to ensure our observability strategy remains modern and effective

Must-Have Skills & Experience

Extensive Technical Experience: 10+ years in Observability, software engineering, SRE, or platform engineering, with a proven track record of architecting and implementing large-scale technical solutions
Deep Observability Expertise: Subject matter expert on Observability practices and collection of all types of telemetry. SME for the tools that support them (e.g., Grafana, Prometheus, Dynatrace, BigPanda, xMatters or others)
Strategic Leadership: Demonstrable experience creating and executing a technical strategy across multiple teams or an entire organization
SRE and SLO Mastery: Deep, practical knowledge of Site Reliability Engineering principles, with hands-on experience defining, implementing, and managing SLOs and error budgets
OpenTelemetry Proficiency: Strong understanding and practical experience with the OpenTelemetry standard for instrumentation and telemetry collection
Multi-Cloud Experience: Experience designing solutions that operate across multiple cloud providers (AWS, Azure, GCP)
Exceptional Communication & Influence: World-class ability to explain highly complex technical concepts to senior executives and non-technical audiences. Proven ability to lead by influence and drive consensus in a large organization

Nice-to-Have Skills & Experience

Tool Rationalization: Experience leading projects to consolidate tooling and migrate teams to a centralized platform
FinOps Knowledge: Understanding of cloud cost management principles and how they apply to observability data and tooling
eBPF Knowledge: Familiarity with eBPF for advanced, low-level system observability and performance analysis
Product Mindset: Experience thinking about internal platforms as products, with a focus on user experience, self-service, and clear documentation

Why Join Us?

Unmatched Impact: Define the technical direction for a critical capability across the entire company, directly influencing the reliability of all our products
Strategic Autonomy: Be the ultimate authority and thought leader in the observability domain with the freedom to shape our future
Executive Visibility: Work directly with senior leadership to connect deep technical initiatives with strategic business outcomes
Culture: Join a forward-thinking organization that invests in technical excellence and empowers its leaders to drive meaningful change

What we offer

Exciting work in a great team, global projects, international environment
Opportunity to learn and grow professionally within the company globally
Hybrid working model, flexible role pattern
Pension and health insurance contributions
Internal reward system plus referral program
**5 **weeks annual leave, **5 **sick days, **15 **days of certified sick leave paid above statutory requirements annually, **40 **paid hours annually for volunteering activities, **12 **weeks of parental contribution
Cafeteria for tax free benefits according to your choice (meal vouchers, sport, culture, health, travel, etc.), Multisport Card
Vodafone, Raiffeisen Bank, Foodora, and other discount programs
Up-to-date laptop and iPhone, company car
Parking in the garage for drivers or showers for bikers
Competitive salary, incentive pay, and many more

Ready to take up the challenge? Apply now! Know anybody who might be interested? Refer this job!

The date shown below is the earliest possible closing date for this posting. However, we sometimes extend the job posting period as needed, so please feel free to apply anytime you see the "Apply" button available. You may also reach out to the recruiter directly via https://www.linkedin.com/in/badumtss/

**Required Skills: **

Amazon Web Services (AWS), Communication, Google Cloud Platform (GCP), Microsoft Azure, Platform Engineering, Site Reliability Engineering, Software Engineering, SRE Observability, Strategic Leadership, Telemetry

**Preferred Skills: **

Current Employees apply HERE

Current Contingent Workers apply HERE

**Search Firm Representatives Please Read Carefully ** Merck & Co., Inc., Rahway, NJ, USA, also known as Merck Sharp & Dohme LLC, Rahway, NJ, USA, does not accept unsolicited assistance from search firms for employment opportunities. All CVs / resumes submitted by search firms to any employee at our company without a valid written search agreement in place for this position will be deemed the sole property of our company. No fee will be paid in the event a candidate is hired by our company as a result of an agency referral where no pre-existing agreement is in place. Where agency agreements are in place, introductions are position specific. Please, no phone calls or emails.

**Employee Status: **

Regular

Relocation:

No relocation

VISA Sponsorship:

Yes

Travel Requirements:

10%

Flexible Work Arrangements:

Hybrid

Shift:

Not Indicated

Valid Driving License:

Hazardous Material(s):

N/A

Job Posting End Date:

05/15/2026

***A job posting is effective until 11:59:59PM on the day BEFORE the listed job posting end date. Please ensure you apply to a job posting no later than the day BEFORE the job posting end date. **