Using state machines to give structure and observability to your agentic experiences

The problem

Diagram showing an AI agent with various components like 'Prompt Manager', 'Action Executor', and 'State Machine' connected together.

I truly believe that agentic experiences can be a big win for teams trying to give end users a more powerful and customized experience.

It's really hard to build agentic experiences that take the users through a complex flow without the experience falling apart. I've had experiences building experiences where an agent will randomly handoff to another agent, forget what it was doing, or get stuck in loops repeating the same action over and over.

When building AI agents that need to perform multi-step tasks, it's easy for the agent to get lost, repeat steps, or make inconsistent decisions. This leads to a frustrating user experience.

One way to address this is to give your agent a "spine" - a structured framework that guides its behavior and limits it's available actions at any given time.

State Machines

First of there are a couple of really interesting ways that people are looking into fixing this issue. Mastra is a really cool project that aims to provide a framework for building agentic experiences around predefined flows. But I decided to explore a more general purpose approach using state machines.

If you haven't heard of state machines before, they are a way to model the behavior of a system by defining a set of states and the transitions between them. At any given time, the system is in one state, and can only transition to certain other states based on defined rules.

For example, consider a state machine for a traffic light:

Now lets see this conversation I had with ChatGPT where I asked it to pretend to be a traffic light for me:

Chat Session

View source

you are in charge of operating a single traffic light, your job is to say "red", "yellow" and "green" in the correct order

You can see that the agent kind of didn't understand the order of operations here. It got confused when I added extra context about the cars waiting at the light (it also didn't really understand the traffic light setup to begin with as you really need a yellow light going from green to red). This is because the agent doesn't have a defined structure to follow, it's just vibing it out based on what feels best.

We can actually help the agent out by giving it a state machine structure to follow.

The basic agent loop

First off before we tackle the traffic light example, let's start with a really simple agent that just listens for messages and responds to them. We can model this flow with a simple state machine:

Loading diagram...

This is a very simple agent that just has two states: "listening" and "responding".

Here's what that would look like in javascript

import { setup, assign } from "xstate"
import { Message } from './types'

export const agenticMachine = setup({
  context: {
    messages: [] as Message[],
  },
  actions: {
    addAssistantMessage: assign({
      messages: ({ context, event }) => [
        ...context.messages,
        { role: "assistant", message: event.message },
      ],
    }),
    addUserMessage: assign({
      messages: ({ context, event }) => [
        ...context.messages,
        { role: "user", message: event.message },
      ],
    }),
  },
}).createMachine({
  context: {
    messages: [],
  },
  id: "simpleAgent",
  initial: "listening",
  states: {
    listening: {
      on: {
        MESSAGE_RECEIVED: {
          target: "responding",
          actions: "addUserMessage",
        },
      },
    },
    responding: {
      on: {
        RESPOND: {
          target: "listening",
          actions: "addAssistantMessage",
        },
      },
    },
  },
})

This simple state machine ensures that the agent can only be in one of two states at any given time, and can only transition between them in a defined way.

Bryn Shanahan

Giving your Agent a Spine: Part 1 - Agent Anatomy

The problem

State Machines

The basic agent loop