AI Product Engineer - Clickstack

ClickHouse ClickHouse · Data AI · United States · Engineering

AI Product Engineer to build agentic capabilities on top of a petabyte-scale observability platform, with a focus on developer experience. The role involves building agents that investigate incidents, writing reusable skills, owning the agent stack end-to-end, and making ClickStack a great place to run AI workloads. The engineer will tackle challenges like latency, cost, context window limits, eval coverage, and hallucinations.

What you'd actually do

  1. Build agents that investigate incidents. They surface anomalies, answer "why is production broken?", and use ClickStack as their substrate.
  2. Write skills, not just prompts. Build a library of reusable skills that captures how our team debugs, finds root causes, writes ClickHouse queries, and runs incident response, so agents pick up the right playbook instead of starting from scratch.
  3. Own the agent stack end-to-end. Context engineering, tool design, evals, tracing, cost. You're responsible for whether the agent works in production.
  4. Make ClickStack a great place to run AI workloads. Build the MCP servers, SDKs, and integrations that let customers' agents read telemetry, take action, and stay observable themselves.
  5. Work in the open. Collaborate with OSS contributors and customers, debug their problems alongside them, and feed what you learn back into the product.

Skills

Required

  • 5+ years of software engineering experience, including 1–2 years on LLM-powered systems or agents in production.
  • Strong backend skills in TypeScript/Node.js and/or Python. Comfortable in both, even if one is primary.
  • Hands-on experience building agents: multi-step tool use, planning, memory, error recovery. You've shipped them and dealt with the failure modes.
  • Experience designing skills (Markdown-based workflow encodings, Anthropic-style or similar) and a clear view on when a skill, a tool, or both is the right fit.
  • Experience with MCP: building servers, designing tools, and thinking through auth, scoping, and observability for agentic systems.
  • Strong evals practice: golden sets, LLM-as-judge, regression detection.
  • SQL proficiency — you can write ClickHouse queries directly.
  • Comfort with Docker and Kubernetes.

Nice to have

  • Built or operated production agents in observability, incident response, or SRE.
  • Strong opinions on agent observability — tracing, cost attribution, eval pipelines, OpenTelemetry for agents — and ideas on how to improve it.
  • Experience with prompt caching, context compaction, or other techniques relevant to running agents on production telemetry volumes.
  • Experience with columnar databases and event ingestion pipelines.
  • Contributed to or maintained an open source AI/agent project.
  • Familiarity with Go, Rust, or other systems languages for integrations and high-throughput infra.

What the JD emphasized

  • building agents
  • writing skills
  • agent stack end-to-end
  • agent observability
  • multi-step tool use
  • planning
  • memory
  • error recovery
  • shipped them and dealt with the failure modes
  • building servers
  • designing tools
  • thinking through auth, scoping, and observability for agentic systems
  • strong evals practice
  • golden sets
  • LLM-as-judge
  • regression detection

Other signals

  • building agentic capabilities
  • focus on developer experience
  • building agents that investigate incidents
  • writing reusable skills
  • owning the agent stack end-to-end
  • making ClickStack a great place to run AI workloads
  • tackling hard parts like latency, cost, context window limits, eval coverage, hallucinations