Staff AI Engineer | Canada | Remote

Grafana Labs Grafana Labs · Data AI · Canada, United States · Remote · Marketing

Staff AI Engineer responsible for owning the AI agent infrastructure and automation platform for Marketing Operations. This role involves building multi-agent architectures, LLM integrations, and backend services connecting AI models to internal and third-party data platforms, shipping production systems. The engineer will define technical direction, partner with other teams, and build scalable, self-service automation.

What you'd actually do

  1. Own end-to-end development of multi-agent AI systems, from architecture and implementation through testing, deployment, and ongoing operation
  2. Build modular, composable agentic systems using orchestration frameworks (LangChain, CrewAI, Anthropic MCP, or similar) that operate 24/7 across teams
  3. Build MCP servers, APIs, CLIs, and microservices connecting AI models to business systems (BigQuery, Slack, CRMs, email, calendars, analytics tools)
  4. Partner with RevOps, Demand Generation, Regional Marketing, and SDR teams to scope high-impact automation problems, identify bottlenecks, and build solutions with measurable business outcomes
  5. Establish governance and compliance standards for AI workflows including access controls, audit trails, PII handling, and human-in-the-loop escalation paths

Skills

Required

  • Python
  • JavaScript/Node.js
  • Git-based workflows
  • code review practices
  • testing discipline
  • prompt engineering
  • RAG
  • function calling/tool use
  • structured output parsing
  • evaluation
  • multi-agent systems
  • agent decomposition
  • orchestration patterns
  • state management
  • production monitoring
  • Google Cloud Platform
  • BigQuery
  • serverless/containerized services (Cloud Functions, Cloud Run)
  • LLM failure modes
  • production mitigations
  • confidence thresholds
  • fallback logic
  • human escalation
  • cost/latency management
  • AI-assisted development tools

Nice to have

  • LangChain
  • CrewAI
  • Anthropic MCP
  • n8n
  • Workato
  • Grafana's cloud infrastructure

What the JD emphasized

  • own the AI agent infrastructure and automation platform
  • build multi-agent architectures
  • ship production systems
  • own the technical direction
  • end-to-end development of multi-agent AI systems
  • build modular, composable agentic systems
  • build MCP servers, APIs, CLIs, and microservices
  • Architect data flows for retrieval-augmented generation (RAG)
  • Build systems designed for self-service
  • 8+ years of software engineering experience
  • 2+ years hands-on experience applying LLMs/AI to production workflows
  • Experience building and operating multi-agent systems at scale
  • Proven ability to identify high-leverage problems
  • Fluent with AI-assisted development tools

Other signals

  • multi-agent architectures
  • LLM integrations
  • production systems
  • technical direction
  • automation platform