Staff Software Engineer, Agent Orchestration

Decagon Decagon · Vertical AI · New York, NY · Engineering

Staff Software Engineer to design and build the runtime and model orchestration layer for conversational AI agents. This role focuses on the agent harness, including routing, execution logic, tool orchestration, and control-plane systems to ensure reliable, low-latency, and delightful user experiences across various channels. The position involves optimizing for latency, reliability, and production correctness, analyzing failures, building feedback loops, and adapting systems as model capabilities evolve.

What you'd actually do

  1. Design and evolve agent harnesses that power different product experiences
  2. Build core runtime systems, including AOP execution and multi-model orchestration
  3. Develop control-plane logic for routing, planning, and tool invocation with strong safety guarantees
  4. Optimize agent systems for latency, reliability, and production correctness
  5. Analyze real-world failures and use data to drive iterative improvements

Skills

Required

  • Strong experience building distributed systems or backend platforms in production environments
  • Comfort working in ambiguous, fast-moving environments with rapid iteration cycles
  • Experience owning systems end-to-end, from design through production and iteration
  • Familiarity with experimentation, evaluation, or data-driven product improvement loops
  • A track record of improving system reliability, performance, and observability
  • Ability to debug complex systems and identify root causes of failures

Nice to have

  • You’ve built or worked on agent harnesses, orchestration layers, or execution frameworks
  • You think in terms of control planes, feedback loops, and system-level optimization, not just features
  • You’re excited about diagnosing failure modes and iterating toward measurable improvements
  • You care deeply about production quality—not just making systems work, but making them reliable, safe, and scalable
  • You’re motivated by pushing the frontier of how intelligent systems behave in the real world

What the JD emphasized

  • design and build the systems that govern how Decagon agents operate in real-world environments
  • own complex, distributed systems that sit at the heart of the agent runtime
  • fast, reliable, and continuously improving
  • move fluidly between diagnosing production issues, designing new system abstractions, and running experiments
  • ship improvements safely and at scale
  • real-time systems (e.g., voice interactions with strict latency requirements)
  • agent's task execution reliability increasingly depends on the orchestration layer
  • highly experimental, frontier-style engineering
  • continuously analyzes real-world failures
  • builds feedback loops through offline evaluation and online experimentation
  • iterates quickly to improve quality, reliability, and capability
  • regularly rethinks system design to push agent performance forward in production
  • Strong experience building distributed systems or backend platforms in production environments
  • Comfort working in ambiguous, fast-moving environments with rapid iteration cycles
  • Experience owning systems end-to-end, from design through production and iteration
  • Familiarity with experimentation, evaluation, or data-driven product improvement loops
  • A track record of improving system reliability, performance, and observability
  • Ability to debug complex systems and identify root causes of failures
  • built or worked on agent harnesses, orchestration layers, or execution frameworks
  • think in terms of control planes, feedback loops, and system-level optimization
  • diagnosing failure modes and iterating toward measurable improvements
  • care deeply about production quality—not just making systems work, but making them reliable, safe, and scalable
  • motivated by pushing the frontier of how intelligent systems behave in the real world

Other signals

  • agent orchestration
  • multi-model orchestration
  • tool use
  • safety constraints
  • real-time systems
  • experimentation
  • evaluation