Senior Observability Platform Engineer

Adobe Adobe · Enterprise · Bucharest, Romania

Senior Observability Platform Engineer responsible for building and finding best-of-breed tools for critical Observability services at Adobe. The role involves crafting new tools, maintaining large-scale logging deployments, and shaping Adobe's observability strategy. Key responsibilities include improving logging system performance, stability, and cost efficiency, integrating OpenTelemetry, optimizing ingestion costs, and developing AI-assisted tooling for automation and insights from log data. Requires strong programming skills in Go/Python, cloud platform experience, and defining SLOs/SLIs.

What you'd actually do

  1. Ability to own and drive ingestion cost optimization end-to-end: analyzing pipeline data, designing guardrails, and engaging directly with customer engineering teams to identify and reduce unnecessary log volume
  2. Experience integrating AI workflows into large-scale deployments; ability to design and implement AI-assisted tooling that automates user interactions and surfaces actionable insights from high-volume log datasets
  3. Deep hands-on experience with internally hosted logging systems such as Splunk, ClickHouse, Loki, or Elastic; track record of improving environment performance, stability, and cost efficiency at scale
  4. Experience with OpenTelemetry — including collector configuration, pipelines, and instrumentation — as a core requirement given Adobe’s OTel-native observability strategy
  5. Proven ability to design systems for fault tolerance, scalability, and stability, and to lead resolution of high-complexity performance and reliability issues

Skills

Required

  • 5-8+ years of production-level experience with distributed applications at scale in public and/or private cloud
  • Proven experience designing and contributing to the architecture of large-scale Observability platforms
  • Deep hands-on experience with internally hosted logging systems such as Splunk, ClickHouse, Loki, or Elastic
  • Experience with OpenTelemetry — including collector configuration, pipelines, and instrumentation
  • Ability to own and drive ingestion cost optimization end-to-end
  • Experience integrating AI workflows into large-scale deployments
  • Strong programming skills in Go and/or Python
  • Experience building production-grade integrations and applications for large-scale Observability environments
  • Experience developing, deploying, and operating distributed applications on cloud platforms
  • Strong command of container and orchestration technologies (Docker, Kubernetes)
  • Proven ability to design systems for fault tolerance, scalability, and stability
  • Experience defining service level objectives (SLOs) and service level indicators (SLIs)
  • Knowledge of public and/or private cloud deployments — AWS, Azure, Data Center
  • Comfortable owning on-call coverage across a multi-tool observability stack, including leading incident response for high-severity issues

Nice to have

  • Experience evaluating or prototyping alternative storage/processing backends (e.g., ClickHouse, Loki)
  • Experience with other Observability tooling such as Grafana, Cortex, and Tempo

What the JD emphasized

  • core requirement
  • must have
  • track record of improving environment performance, stability, and cost efficiency at scale
  • Ability to own and drive ingestion cost optimization end-to-end
  • Experience integrating AI workflows into large-scale deployments

Other signals

  • integrating AI workflows into large-scale deployments
  • design and implement AI-assisted tooling
  • automates user interactions
  • surfaces actionable insights from high-volume log datasets
  • OpenTelemetry — including collector configuration, pipelines, and instrumentation — as a core requirement given Adobe’s OTel-native observability strategy