What you'd actually do

Design and evolve agent harnesses that power different product experiences

Build core runtime systems, including AOP execution and multi-model orchestration

Develop control-plane logic for routing, planning, and tool invocation with strong safety guarantees

Optimize agent systems for latency, reliability, and production correctness

Analyze real-world failures and use data to drive iterative improvements

Skills

Required

Strong experience building distributed systems or backend platforms in production environments
Comfort working in ambiguous, fast-moving environments with rapid iteration cycles
Experience owning systems end-to-end, from design through production and iteration
Familiarity with experimentation, evaluation, or data-driven product improvement loops
A track record of improving system reliability, performance, and observability
Ability to debug complex systems and identify root causes of failures

Nice to have

You’ve built or worked on agent harnesses, orchestration layers, or execution frameworks
You think in terms of control planes, feedback loops, and system-level optimization, not just features
You’re excited about diagnosing failure modes and iterating toward measurable improvements
You care deeply about production quality—not just making systems work, but making them reliable, safe, and scalable
You’re motivated by pushing the frontier of how intelligent systems behave in the real world

What the JD emphasized

design and build the systems that govern how Decagon agents operate in real-world environments

own complex, distributed systems that sit at the heart of the agent runtime

fast, reliable, and continuously improving

move fluidly between diagnosing production issues, designing new system abstractions, and running experiments

ship improvements safely and at scale

real-time systems (e.g., voice interactions with strict latency requirements)

agent's task execution reliability increasingly depends on the orchestration layer

highly experimental, frontier-style engineering

continuously analyzes real-world failures

builds feedback loops through offline evaluation and online experimentation

iterates quickly to improve quality, reliability, and capability

regularly rethinks system design to push agent performance forward in production

Strong experience building distributed systems or backend platforms in production environments

Comfort working in ambiguous, fast-moving environments with rapid iteration cycles

Experience owning systems end-to-end, from design through production and iteration

Familiarity with experimentation, evaluation, or data-driven product improvement loops

A track record of improving system reliability, performance, and observability

Ability to debug complex systems and identify root causes of failures

built or worked on agent harnesses, orchestration layers, or execution frameworks

think in terms of control planes, feedback loops, and system-level optimization

diagnosing failure modes and iterating toward measurable improvements

care deeply about production quality—not just making systems work, but making them reliable, safe, and scalable

motivated by pushing the frontier of how intelligent systems behave in the real world

About Decagon

Decagon is the leading conversational AI platform empowering every brand to deliver concierge customer experiences.

Our technology enables industry-defining enterprises like Avis Budget Group, Block’s Cash App and Square, Chime, Oura Health, and Hunter Douglas to deploy AI agents that power personalized, deeply satisfying interactions across voice, chat, email, SMS, and every other channel.

We’re building a future where customer experiences are being redefined from support tickets and hold music to faster resolutions, richer conversations, and deeper relationships. We’re proud to be backed by world-class investors who share that vision, including a16z, Accel, Bain Capital Ventures, Coatue, and Index Ventures, along with many others.

We’re an in-office company, driven by a shared commitment to excellence and velocity. Our values — Just Get It Done, Invent What Customers Want, Winner’s Mindset, and The Polymath Principle — shape how we work and grow as a team.

About the Team

The Agent Orchestration team builds the runtime and model orchestration layer that powers Decagon’s agents in production. This is the orchestration layer that turns workflows, tools and guardrails into a reliable, low-latency, and delightful experience for end users.

At the core of this work is the agent harness: the routing, execution logic, tool orchestration, and control-plane systems that determine how an agent behaves in a live conversation. The team owns the full execution lifecycle of each conversation—from selecting workflows and orchestrating multiple models (e.g., router/planner/supervisor patterns), to coordinating tool calls, enforcing safety constraints, and communicating back to the user.

The team operates across both real-time systems (e.g., voice interactions with strict latency requirements) and longer-horizon execution (supporting more complex reasoning and workflows). Our research shows that an agent’s task execution reliability increasingly depends on the orchestration layer that wraps around it.

This is highly experimental, frontier-style engineering. The team continuously analyzes real-world failures, builds feedback loops through offline evaluation and online experimentation, and iterates quickly to improve quality, reliability, and capability. As model capabilities evolve, the team regularly rethinks system design to push agent performance forward in production.

About the Role

As a Staff Software Engineer on the Agent Orchestration team, you will design and build the systems that govern how Decagon agents operate in real-world environments.

You will own complex, distributed systems that sit at the heart of the agent runtime: execution frameworks, model orchestration logic, and experimentation platforms that ensure agents are fast, reliable, and continuously improving. Your work will directly impact how agents reason, take actions, and deliver outcomes across millions of interactions.

This role operates in a fast-moving, ambiguous space with tight feedback loops. You’ll move fluidly between diagnosing production issues, designing new system abstractions, and running experiments to improve agent behavior. You’ll collaborate closely with Research, Infra, and Product teams to ship improvements safely and at scale.

In this role, you will

Design and evolve agent harnesses that power different product experiences
Build core runtime systems, including AOP execution and multi-model orchestration
Develop control-plane logic for routing, planning, and tool invocation with strong safety guarantees
Optimize agent systems for latency, reliability, and production correctness
Analyze real-world failures and use data to drive iterative improvements
Build and operate online experimentation (A/B testing) and contribute to offline evaluation frameworks
Improve observability, testing, and simulation systems to ensure safe, measurable progress
Contribute to voice and real-time systems (e.g., transcription pipelines, turn-taking, latency improvements)
Continuously adapt orchestration systems as model capabilities evolve

Your background looks something like this

Strong experience building distributed systems or backend platforms in production environments
Comfort working in ambiguous, fast-moving environments with rapid iteration cycles
Experience owning systems end-to-end, from design through production and iteration
Familiarity with experimentation, evaluation, or data-driven product improvement loops
A track record of improving system reliability, performance, and observability
Ability to debug complex systems and identify root causes of failures

Even better

You’ve built or worked on agent harnesses, orchestration layers, or execution frameworks
You think in terms of control planes, feedback loops, and system-level optimization, not just features
You’re excited about diagnosing failure modes and iterating toward measurable improvements
You care deeply about production quality—not just making systems work, but making them reliable, safe, and scalable
You’re motivated by pushing the frontier of how intelligent systems behave in the real world

Compensation

$200K – $400K + Offers Equity

This range reflects the expected compensation for this role. Compensation within the range is determined based on experience, skills, and the scope of responsibilities, with flexibility for candidates who demonstrate exceptional impact.

In addition to base salary, we offer competitive equity. Final compensation may vary based on location within the United States.

Benefits

We proudly offer the following benefits for our full-time employees:

Take what you need vacation policy (subject to local requirements; UK employees receive 25 days of statutory leave)
Medical, Dental, and Vision benefits for you and your family
Life Insurance and Disability Benefits
Retirement Plan (e.g., 401K, pension)
Parental Leave
Fertility and family building benefits through Carrot
Daily lunches and snacks in the office to keep you at your best

These benefits are described in more detail in Decagon’s policies, may vary by location, and can change at any time according to applicable compensation and benefits plans.