Devops Engineer - Observability

Adobe Adobe · Enterprise · Bucharest, Romania

DevOps Engineer focused on building and maintaining observability services at Adobe, with a specific emphasis on integrating AI agent development and AI workflows into large-scale deployments to extract insights from log data and automate user interactions.

What you'd actually do

  1. Experience architecting and implementing large-scale Observability platforms
  2. Experience with internally hosted logging systems like Splunk, ClickHouse, Loki, Elastic, assisting clients and improving environment performance and stability
  3. Demonstrated ability to drive ingestion cost optimization through data-driven analysis, pipeline guardrails, and direct engagement with customer engineering teams to reduce unnecessary log volume
  4. Experience with OpenTelemetry — including collector configuration, pipelines, and instrumentation — as a core requirement given Adobe's OTel-native observability strategy
  5. AI agent development and experience integrating AI workflows into large-scale deployments; ability to build AI-assisted workflows to surface actionable insights from large log datasets and automate routine user interactions

Skills

Required

  • production level experience with distributed applications at scale in public and/or private cloud
  • architecting and implementing large-scale Observability platforms
  • internally hosted logging systems like Splunk, ClickHouse, Loki, Elastic
  • ingestion cost optimization
  • OpenTelemetry
  • AI agent development
  • integrating AI workflows into large-scale deployments
  • build AI-assisted workflows
  • surface actionable insights from large log datasets
  • automate routine user interactions
  • architecting distributed environments with thousands of users
  • Go
  • Python
  • building integrations and applications to large-scale Observability environments
  • designing and implementing systems for fault tolerance, scalability and stability
  • developing, deploying and running distributed applications on cloud platforms
  • container and orchestration technologies (Docker, Kubernetes)
  • on-call coverage
  • triage and resolve issues across platforms
  • highest level of up-time and Quality of Service (QoS)
  • defining service level objectives (SLOs) and service level indicators (SLIs)
  • cloud deployments
  • collaborate with SRE and Engineering/Product teams
  • designing and maintaining production monitoring systems
  • solving performance and stability issues
  • Excellent communicator

Nice to have

  • evaluating and prototyping alternative storage/processing backends (e.g., ClickHouse, Loki)
  • Grafana
  • Cortex
  • Tempo
  • DevOps/SRE approach

What the JD emphasized

  • AI agent development
  • integrating AI workflows
  • build AI-assisted workflows
  • surface actionable insights from large log datasets
  • automate routine user interactions
  • OpenTelemetry

Other signals

  • AI agent development
  • integrating AI workflows
  • build AI-assisted workflows
  • surface actionable insights from large log datasets
  • automate routine user interactions