Principal Site Reliability Engineer - Observability

Elastic Elastic · Enterprise · Spain · Obs - Application Observability

Principal Site Reliability Engineer for Elastic's Observability Solution, focusing on building next-generation infrastructure observability experiences using the Elastic Search AI Platform and coding agents. The role involves collaborating across teams, delivering and evolving experiences, and operating large-scale production services with observability tools.

What you'd actually do

  1. Collaborate with product management, product design, customers and multiple teams across Elastic (especially our own SRE teams) in defining and evolving the end-to-end InfraObs experiences that enable both human and agentic users.
  2. Deliver and continually evolve the experiences leveraging the Elastic Platform capabilities and coding agents.
  3. Be a contact point for other teams within Elastic. Examples include helping Support with difficult cases or aligning with the teams providing the foundations for developing integrations or consulting the Elastic Stack engineers with designing new features.
  4. Foster a culture of mutual respect, collaboration and consensus based decision-making.
  5. Be an awesome person to work with, somebody who sincerely empathizes with others.

Skills

Required

  • SRE background
  • operating large-scale production services
  • Observability tools
  • Proficiency operating production infrastructure in K8s
  • Proficiency using Observability tools
  • Working with a high level of autonomy
  • Ability to use AI coding agents in the delivery workflow
  • Excellent verbal and written communication skills

Nice to have

  • Experience as a user of the Elastic Stack

What the JD emphasized

  • Practitioners
  • SRE background
  • operating large-scale production services
  • Proficiency operating production infrastructure in K8s
  • Ability to use AI coding agents in the delivery workflow