Senior AI Engineer - Grafana Ops, Ai/ml | Canada | Remote

Grafana Labs Grafana Labs · Data AI · Canada, United States · Remote · R&D: Ops

Senior AI Engineer focused on building and shipping AI-powered features for observability tools, including LLM- or agent-powered workflows for incident management and automated analysis. The role emphasizes rapid experimentation, cross-functional collaboration, and delivering production-ready AI solutions.

What you'd actually do

  1. Build and deliver AI solutions: Take ownership of developing high-performance AI features to help users detect, triage, and resolve incidents using observability data and tools.
  2. Rapid experimentation and iteration: Implement a highly iterative process where you quickly prototype, test, and validate with real users, including shipping and evolving LLM- or agent-powered workflows for incident lifecycle management and automated analysis tasks.
  3. Collaborate cross-functionally: Work with data analysts, product managers, and designers to shape AI-driven product features, including integration of agentic components with internal tools, alerting systems, runbooks, and developer workflows.
  4. Utilize AI tools effectively: Use AI and automation tools to enhance both product functionality and your own development workflows.
  5. Ownership and impact: Take full ownership of the AI solutions you develop, ensuring they are not only innovative but also scalable, maintainable, and aligned with real user workflows.

Skills

Required

  • Experience with LLMs, prompt engineering, and building applications powered by GenAI.
  • Proven track record of delivering software that made it into production and is actively used by users.
  • Exposure to working in cloud-native environments (e.g., AWS, GCP, Azure).
  • Strong engineering skills: Solid experience building production software systems (backend and / or full stack).
  • AI experience with a practical mindset.
  • Quick iteration and experimentation.
  • Proven initiative.
  • Collaborative attitude.

Nice to have

  • Experience with agentic components
  • Experience with observability data and tools

What the JD emphasized

  • shipping and scaling impactful features
  • shipping and evolving LLM- or agent-powered workflows
  • Proven track record of delivering software that made it into production and is actively used by users.

Other signals

  • AI-driven features
  • LLM-powered workflows
  • agentic components
  • AI agents