Staff Observability Engineer

GE Healthcare GE Healthcare · Healthcare · Bengaluru, Karnātaka, India · Digital Technology / IT

Staff Observability Engineer responsible for defining and implementing the observability strategy for GE Healthcare's Cloud Applications. This role involves designing and integrating observability frameworks, instrumenting services, building dashboards and alerts, and ensuring alignment with healthcare compliance standards. Experience with AI-powered anomaly detection and SRE practices is required.

What you'd actually do

  1. Define and evolve the observability vision and roadmap for PCS DS applications
  2. Design and implement/integrate standardized observability frameworks (metrics, logs, traces, events, profiling).
  3. Collaborate with platform, SRE, and product teams to instrument services using OpenTelemetry and other modern observability tooling.
  4. Build and maintain dashboards, alerts, and SLOs that reflect both technical and business health indicators.
  5. Lead / contribute to incident analysis and postmortem reviews, driving improvements in system resilience and observability coverage.

Skills

Required

  • observability strategy
  • instrumentation
  • metrics, logs, traces
  • OpenTelemetry
  • Prometheus
  • Grafana
  • Datadog
  • Dynatrace
  • Go
  • Python
  • Bash
  • Terraform
  • distributed tracing
  • SLO/SLI frameworks
  • incident response workflows
  • distributed systems
  • microservices
  • cloud platforms (AWS, Azure, GCP)
  • AI-powered anomaly detection
  • SRE practices

Nice to have

  • healthcare or regulated industries experience
  • data privacy and compliance (HIPAA, HITRUST)
  • cost optimization
  • telemetry data governance
  • chaos engineering

What the JD emphasized

  • observability vision and roadmap
  • standardized observability frameworks
  • instrument services
  • dashboards, alerts, and SLOs
  • incident analysis
  • healthcare compliance standards
  • observability-first development
  • observability solutions in cloud-native environments
  • observability pillars
  • distributed tracing
  • SLO/SLI frameworks
  • incident response workflows
  • AI-powered anomaly detection