Staff Software Engineer, Agent Orchestration

Decagon Decagon · Vertical AI · San Francisco, CA · Engineering

Staff Software Engineer to design and build the runtime and model orchestration layer for Decagon's conversational AI platform. This role focuses on the agent harness, which handles routing, execution logic, tool orchestration, and control-plane systems for reliable, low-latency agent behavior in production. The work involves building distributed systems, optimizing for performance and reliability, and iterating based on real-world failures and experimentation.

What you'd actually do

  1. Design and evolve agent harnesses that power different product experiences
  2. Build core runtime systems, including AOP execution and multi-model orchestration
  3. Develop control-plane logic for routing, planning, and tool invocation with strong safety guarantees
  4. Optimize agent systems for latency, reliability, and production correctness
  5. Analyze real-world failures and use data to drive iterative improvements

Skills

Required

  • Strong experience building distributed systems or backend platforms in production environments
  • Comfort working in ambiguous, fast-moving environments with rapid iteration cycles
  • Experience owning systems end-to-end, from design through production and iteration
  • Familiarity with experimentation, evaluation, or data-driven product improvement loops
  • A track record of improving system reliability, performance, and observability
  • Ability to debug complex systems and identify root causes of failures

Nice to have

  • You’ve built or worked on agent harnesses, orchestration layers, or execution frameworks
  • You think in terms of control planes, feedback loops, and system-level optimization, not just features
  • You’re excited about diagnosing failure modes and iterating toward measurable improvements
  • You care deeply about production quality—not just making systems work, but making them reliable, safe, and scalable
  • You’re motivated by pushing the frontier of how intelligent systems behave in the real world

What the JD emphasized

  • runtime and model orchestration layer
  • agent harness
  • routing, execution logic, tool orchestration, and control-plane systems
  • full execution lifecycle of each conversation
  • orchestrating multiple models
  • coordinating tool calls, enforcing safety constraints
  • real-time systems (e.g., voice interactions with strict latency requirements)
  • longer-horizon execution (supporting more complex reasoning and workflows)
  • agent's task execution reliability increasingly depends on the orchestration layer
  • highly experimental, frontier-style engineering
  • continuously analyzes real-world failures, builds feedback loops through offline evaluation and online experimentation
  • rethink system design to push agent performance forward in production
  • design and build the systems that govern how Decagon agents operate in real-world environments
  • own complex, distributed systems that sit at the heart of the agent runtime: execution frameworks, model orchestration logic, and experimentation platforms
  • agents are fast, reliable, and continuously improving
  • work will directly impact how agents reason, take actions, and deliver outcomes across millions of interactions
  • fast-moving, ambiguous space with tight feedback loops
  • move fluidly between diagnosing production issues, designing new system abstractions, and running experiments to improve agent behavior
  • collaborate closely with Research, Infra, and Product teams to ship improvements safely and at scale
  • Strong experience building distributed systems or backend platforms in production environments
  • Comfort working in ambiguous, fast-moving environments with rapid iteration cycles
  • Experience owning systems end-to-end, from design through production and iteration
  • Familiarity with experimentation, evaluation, or data-driven product improvement loops
  • A track record of improving system reliability, performance, and observability
  • Ability to debug complex systems and identify root causes of failures
  • built or worked on agent harnesses, orchestration layers, or execution frameworks
  • think in terms of control planes, feedback loops, and system-level optimization, not just features
  • diagnosing failure modes and iterating toward measurable improvements
  • care deeply about production quality—not just making systems work, but making them reliable, safe, and scalable
  • motivated by pushing the frontier of how intelligent systems behave in the real world

Other signals

  • agent orchestration
  • multi-model orchestration
  • runtime systems
  • control-plane logic
  • experimentation