Senior Software Engineer (observability)

Zendesk Zendesk · Enterprise · Krakow, Poland

Senior Software Engineer focused on building observability foundations for cloud-native services and an organization's LLM/AI stack, including telemetry pipelines, dashboards, and alerting for critical AI paths. Requires experience with observability tooling and demonstrated understanding of ML/AI system monitoring.

What you'd actually do

  1. Architect, build, and operate end-to-end observability for metrics, traces, and logs — including telemetry pipelines, dashboards, storage/retention policies, and alerting.
  2. Build LLM/AI observability primitives
  3. Deliver reusable instrumentation templates, SLOs/SLIs, dashboards, and alerting bundles for critical AI paths.
  4. Instrument services and libraries (OpenTelemetry or equivalent); provide developer-facing libraries and patterns to ensure consistent telemetry across services.
  5. Onboard and upskill acquired and existing EMEA teams: create playbooks, run training sessions, and enable teams to adopt observability best practices and shared tooling.

Skills

Required

  • 5+ years of software engineering experience with at least 3 years focused on observability, SRE, or monitoring.
  • Hands-on experience with observability tooling (Datadog, Grafana, Prometheus, OpenTelemetry, or equivalent).
  • Experience building telemetry for ML/AI systems or demonstrated understanding of model monitoring concepts (drift detection, prediction quality, inference telemetry).
  • Strong knowledge of distributed systems and cloud-native environments (Kubernetes, AWS).
  • Production coding skills in at least one of: Go, Python, or Ruby — able to build instrumentation, services, and automation.
  • Experience designing telemetry pipelines at scale, including retention/cost tradeoffs and data validation.
  • Practical experience defining SLOs/SLIs and tuning alerting to be actionable and low-noise.
  • Strong communicator — able to lead enablement, write clear playbooks, and influence cross-functional stakeholders.
  • Eligibility to work in Poland.

Nice to have

  • Experience with modern observability tools and platforms (OpenTelemetry for ML, Jaeger, Jaeger+OTel, Seldon, MLflow, Feast).
  • Experience with Datadog for APM and model telemetry at enterprise scale.
  • Knowledge of data privacy and security considerations for telemetry (PII handling, encryption, retention policies).
  • Prior experience onboarding acquired teams or driving cross-region enablement and training.

What the JD emphasized

  • Experience building telemetry for ML/AI systems or demonstrated understanding of model monitoring concepts (drift detection, prediction quality, inference telemetry).
  • Strong knowledge of distributed systems and cloud-native environments (Kubernetes, AWS).
  • Production coding skills in at least one of: Go, Python, or Ruby — able to build instrumentation, services, and automation.
  • Experience designing telemetry pipelines at scale, including retention/cost tradeoffs and data validation.
  • Practical experience defining SLOs/SLIs and tuning alerting to be actionable and low-noise.

Other signals

  • LLM/AI observability primitives
  • telemetry for ML/AI systems
  • model monitoring concepts