Observability Specialist

Deel Deel · Enterprise · EMEA · R&D

Observability Engineer role focused on designing, implementing, and evolving monitoring and observability ecosystems for a cloud-native SaaS platform. Responsibilities include managing AWS and Kubernetes environments, operating self-hosted monitoring stacks (Prometheus, Grafana, etc.), optimizing DataDog usage, and integrating monitoring into CI/CD pipelines. The role emphasizes system reliability, performance visibility, and cost-efficient monitoring at scale.

What you'd actually do

  1. Design, implement, and maintain scalable observability solutions for cloud-native environments
  2. Own monitoring across AWS and Kubernetes (EKS) environments, covering clusters and workloads
  3. Operate and maintain self-hosted monitoring stacks (e.g., Prometheus, Grafana, Mimir, Loki, Tempo)
  4. Manage and optimize DataDog (metrics, logs, APM, alerts, cost monitoring)
  5. Improve observability architecture to support high availability, scalability, and fault tolerance

Skills

Required

  • Monitoring/observability engineering
  • Cloud-native environments
  • AWS services
  • Kubernetes (EKS)
  • Prometheus
  • Grafana
  • Mimir
  • Loki
  • Tempo
  • DataDog
  • High availability architectures
  • Scalability architectures
  • Fault-tolerant architectures
  • Infrastructure as Code (Terraform, Helm)
  • CI/CD pipelines
  • Capacity planning
  • Performance tuning

Nice to have

  • GitHub Actions

What the JD emphasized

  • 5+ years of hands-on experience in monitoring / observability engineering within cloud-native environments
  • Strong experience with AWS services 5+ years of hands-on experience working with Kubernetes
  • Solid knowledge of Kubernetes monitoring, including metrics, logs, and traces for clusters and workloads, alerting, SLOs, SLIs, and dashboards.
  • Proven experience operating and maintaining self-hosted monitoring stacks, advantage: Prometheus, Grafana, Mimir, Loki, Tempo Experience designing or improving observability architectures at scale
  • Experience with DataDog (metrics, logs, APM, alerts, and cost monitoring)
  • Experience with monitoring cost optimization, including log and trace sampling strategies, storage and retention optimization