Senior Software Engineer, Backend (ai Agent Runtime)

Cresta Cresta · Vertical AI · AB, Canada, Canada · Remote · Engineering

Senior Software Engineer to build and operate the real-time AI agent runtime infrastructure, focusing on LLM streaming, conversation state management, and a serverless-style function execution platform for custom logic. The role also involves building developer-facing tooling and ensuring production quality through observability and incident response.

What you'd actually do

  1. Build real-time AI agent infrastructure: Design and operate the stateful, low-latency runtime that powers voice and chat AI agents — from LLM streaming and conversation state management to graceful recovery and multi-channel support.
  2. Solve distributed systems problems: Own session management across scaled-out workers — including affinity, checkpointing, crash recovery, and consistency under concurrent access.
  3. Build a function execution platform: Own a serverless-style runtime where customers deploy custom logic — build orchestration, container lifecycle, autoscaling, and versioned rollouts.
  4. Own developer experience and test infrastructure: Build CLI tools, local development environments, and test execution frameworks that let engineers iterate quickly and ship with confidence.
  5. Raise the bar on production quality: Drive observability, incident response, and engineering best practices across the team.

Skills

Required

  • 5+ years of software engineering experience, with meaningful time spent on infrastructure, platform, or systems work.
  • Strong Python and Go
  • Deep understanding of distributed systems: consistency, fault tolerance, state management, concurrency.
  • Experience with Kubernetes and cloud-native infrastructure.
  • Experience building developer-facing tooling — CLIs, SDKs, local dev environments, or internal platforms.
  • Strong communicator who can drive technical decisions, write clear design docs, and mentor others.
  • High bar for code quality — thorough testing, thoughtful code review, and sustainable engineering practices.
  • Comfort operating what you build — on-call, incident response, and production ownership.

Nice to have

  • Experience with real-time voice or streaming media systems.
  • Hands-on with LLM integration — streaming inference, prompt orchestration, retrieval-augmented generation.
  • Experience building serverless or function-as-a-service platforms.
  • Workflow engines (Temporal, Argo, Airflow) for durable, long-running processes.
  • Experience in conversational AI or speech domains.
  • Infrastructure-as-code and GitOps workflows.

What the JD emphasized

  • Build real-time AI agent infrastructure
  • Build a function execution platform
  • AI-native workflow
  • Comfort operating what you build

Other signals

  • AI agent runtime
  • LLM streaming
  • conversation state management
  • function execution platform
  • developer tooling for AI