Staff AI Engineer | US | Remote

Grafana Labs Grafana Labs · Data AI · Canada, United States · Remote · Marketing

Staff AI Engineer to own the AI agent infrastructure and automation platform for Marketing Operations. This role involves building multi-agent architectures, LLM integrations, and backend services, connecting AI models to internal and third-party data platforms, and shipping production systems. The engineer will define the technical direction for the automation platform, partner with other teams, and build scalable, self-service automation.

What you'd actually do

  1. Own end-to-end development of multi-agent AI systems, from architecture and implementation through testing, deployment, and ongoing operation
  2. Build modular, composable agentic systems using orchestration frameworks (LangChain, CrewAI, Anthropic MCP, or similar) that operate 24/7 across teams
  3. Build MCP servers, APIs, CLIs, and microservices connecting AI models to business systems (BigQuery, Slack, CRMs, email, calendars, analytics tools)
  4. Architect data flows for retrieval-augmented generation (RAG), connecting LLMs to internal knowledge bases, customer data, and real-time business context
  5. Partner with RevOps, Demand Generation, Regional Marketing, and SDR teams to scope high-impact automation problems, identify bottlenecks, and build solutions with measurable business outcomes

Skills

Required

  • Python
  • JavaScript/Node.js
  • Git-based workflows
  • code review practices
  • testing discipline
  • LLM frameworks and patterns
  • prompt engineering
  • RAG
  • function calling/tool use
  • structured output parsing
  • evaluation
  • multi-agent systems
  • agent decomposition
  • orchestration patterns
  • state management
  • production monitoring
  • Google Cloud Platform
  • BigQuery
  • serverless/containerized services
  • Cloud Functions
  • Cloud Run
  • LLM failure modes and production mitigations
  • confidence thresholds
  • fallback logic
  • human escalation
  • cost/latency management
  • identify high-leverage problems
  • push back on low-impact requests
  • deliver end-to-end with minimal direction
  • AI-assisted development tools

Nice to have

  • vector databases
  • retrieval

What the JD emphasized

  • own the AI agent infrastructure and automation platform
  • build multi-agent architectures
  • LLM integrations
  • ship production systems
  • own the technical direction
  • define the technical direction for the automation platform
  • 8+ years of software engineering experience
  • 2+ years hands-on experience applying LLMs/AI to production workflows
  • Experience building and operating multi-agent systems at scale

Other signals

  • build multi-agent architectures
  • LLM integrations
  • ship production systems
  • own the technical direction
  • define the technical direction for the automation platform