Senior Software Engineer, Agent Orchestration

Decagon Decagon · Vertical AI · San Francisco, CA · Engineering

The Agent Orchestration team builds the runtime and model orchestration layer that powers Decagon's agents in production. This role focuses on designing and building systems that govern how Decagon agents operate in real-world environments, including execution frameworks, model orchestration logic, and experimentation platforms to ensure agents are fast, reliable, and continuously improving. The work involves optimizing for latency, reliability, and production correctness, analyzing failures, and improving observability and testing systems.

What you'd actually do

  1. Design and evolve agent harnesses that power different product experiences
  2. Build core runtime systems, including AOP execution and multi-model orchestration
  3. Develop control-plane logic for routing, planning, and tool invocation with strong safety guarantees
  4. Optimize agent systems for latency, reliability, and production correctness
  5. Analyze real-world failures and use data to drive iterative improvements

Skills

Required

  • Strong experience building distributed systems or backend platforms in production environments
  • Comfort working in ambiguous, fast-moving environments with rapid iteration cycles
  • Experience owning systems end-to-end, from design through production and iteration
  • Familiarity with experimentation, evaluation, or data-driven product improvement loops
  • A track record of improving system reliability, performance, and observability
  • Ability to debug complex systems and identify root causes of failures

Nice to have

  • You’ve built or worked on agent harnesses, orchestration layers, or execution frameworks
  • You think in terms of control planes, feedback loops, and system-level optimization, not just features
  • You’re excited about diagnosing failure modes and iterating toward measurable improvements
  • You care deeply about production quality—not just making systems work, but making them reliable, safe, and scalable
  • You’re motivated by pushing the frontier of how intelligent systems behave in the real world

What the JD emphasized

  • runtime and model orchestration layer
  • agent harness
  • routing, execution logic, tool orchestration, and control-plane systems
  • multi-model orchestration
  • tool calls
  • safety constraints
  • real-time systems
  • offline evaluation
  • online experimentation
  • agent performance
  • complex, distributed systems
  • fast, reliable, and continuously improving
  • millions of interactions
  • fast-moving, ambiguous space
  • tight feedback loops
  • diagnosing production issues
  • designing new system abstractions
  • running experiments
  • collaborate closely with Research, Infra, and Product teams
  • ship improvements safely and at scale
  • Strong experience building distributed systems or backend platforms in production environments
  • Comfort working in ambiguous, fast-moving environments with rapid iteration cycles
  • Experience owning systems end-to-end, from design through production and iteration
  • Familiarity with experimentation, evaluation, or data-driven product improvement loops
  • A track record of improving system reliability, performance, and observability
  • Ability to debug complex systems and identify root causes of failures
  • built or worked on agent harnesses, orchestration layers, or execution frameworks
  • control planes, feedback loops, and system-level optimization
  • diagnosing failure modes and iterating toward measurable improvements
  • production quality—not just making systems work, but making them reliable, safe, and scalable
  • pushing the frontier of how intelligent systems behave in the real world

Other signals

  • runtime and model orchestration layer
  • agent harness
  • routing, execution logic, tool orchestration, and control-plane systems
  • multi-model orchestration
  • tool calls
  • safety constraints
  • real-time systems
  • offline evaluation
  • online experimentation
  • agent performance