Staff AI Engineer - Grafana Ops, Ai/ml | Canada | Remote

Grafana Labs Grafana Labs · Data AI · Canada, United States · Remote · R&D: Ops

Staff AI Engineer role focused on building and delivering AI solutions for observability data, including LLM- or agent-powered workflows for incident management and automated analysis. The role emphasizes rapid experimentation, cross-functional collaboration, and shipping scalable, maintainable AI features into production.

What you'd actually do

  1. Build and deliver AI solutions: Take ownership of developing high-performance AI features to help users detect, triage, and resolve incidents using observability data and tools.
  2. Rapid experimentation and iteration: Implement a highly iterative process where you quickly prototype, test, and validate with real users, including shipping and evolving LLM- or agent-powered workflows for incident lifecycle management and automated analysis tasks.
  3. Collaborate cross-functionally: Work with data analysts, product managers, and designers to shape AI-driven product features, including integration of agentic components with internal tools, alerting systems, runbooks, and developer workflows.
  4. Utilize AI tools effectively: Use AI and automation tools to enhance both product functionality and your own development workflows.
  5. Ownership and impact: Take full ownership of the AI solutions you develop, ensuring they are not only innovative but also scalable, maintainable, and aligned with real user workflows.

Skills

Required

  • LLMs
  • prompt engineering
  • building applications powered by GenAI
  • production software systems (backend and / or full stack)
  • cloud-native environments (e.g., AWS, GCP, Azure)

Nice to have

  • AI technologies and frameworks
  • quick iteration and experimentation
  • dealing with ambiguity
  • defining scope where things are loosely defined

What the JD emphasized

  • building production software systems
  • delivering software that made it into production and is actively used by users
  • Experience with LLMs, prompt engineering, and building applications powered by GenAI

Other signals

  • building AI-powered features
  • shipping and scaling impactful features
  • integrating agentic components
  • building applications powered by GenAI