Implementing State Machines in PostgreSQL


Finite-state machine (FSM) is a wonderful model of computation that has many practical applications. One of my favorites is turning business logic into simple FSMs. For example consider the following requirements for an order management system:

Informally, turning such a list of requirements into an FSM is simply the process of coming up with: a set of states an order can be in, a set of events, a transition function that maps the combination of each state and event to a new state, and an initial state.

In practice this is usually done by drawing up a directed graph that shows how the different states are connected to each other via events:

Finite-state machine for an order management system in PostgreSQL

Besides guiding our implementation, these graphs can be very useful for discussions and analysis. For example you can immediately see how there is no way for a customer to return an order after it has shipped. Additionally the process of naming the involved states and events automatically creates a precise language that can be adopted by all stakeholders.

You might have noticed that the graph above does not show all possible combinations of all states and all events. The reason is that the missing combinations implicitly lead to an error state. The complete FSM can be shown as a graph as well, but I don’t think it’s very useful in practice.

Anyway, as promised in the title of this article we’re going to implement this FSM in PostgreSQL. This may not be everybody’s cup of tea, but you might like how this approach us gives advanced analytical powers as a free by-product. Embedding this kind of logic into the database can also help protect against race conditions, but this will perhaps be the topic of a future post 1.

Let’s begin by creating an order_events table which keeps track of all events for a given order_id.

CREATE TABLE order_events (
  id serial PRIMARY KEY,
  order_id int NOT NULL,
  event text NOT NULL,
  time timestamp DEFAULT now() NOT NULL

Next, let’s implement the transition function, which is the heart of every FSM:

CREATE FUNCTION order_events_transition(state text, event text) RETURNS text
    WHEN 'start' THEN
      CASE event
        WHEN 'create' THEN 'awaiting_payment'
        ELSE 'error'
    WHEN 'awaiting_payment' THEN
      CASE event
        WHEN 'pay' THEN 'awaiting_shipment'
        WHEN 'cancel' THEN 'canceled'
        ELSE 'error'
    WHEN 'awaiting_shipment' THEN
      CASE event
        WHEN 'cancel' THEN 'awaiting_refund'
        WHEN 'ship' THEN 'shipped'
        ELSE 'error'
    WHEN 'awaiting_refund' THEN
      CASE event
        WHEN 'refund' THEN 'canceled'
        ELSE 'error'
    ELSE 'error'

And before we proceed, let’s test with a few examples to make sure the function is working:

SELECT state, event, order_events_transition(state, event)
  ('start', 'create'),
  ('awaiting_payment', 'pay'),
  ('awaiting_payment', 'cancel'),
  ('awaiting_payment', 'ship')
) AS examples(state, event);
      state       | event  | order_events_transition
 start            | create | awaiting_payment
 awaiting_payment | pay    | awaiting_shipment
 awaiting_payment | cancel | canceled
 awaiting_payment | ship   | error

The above looks correct, but it’s not immediately clear how we could use this function to enforce our FSM on all rows for the same order_id in the order_events table.

The first thing we need is a good way to take a list of events and call our transition function on them recursively to determine the resulting state. There are multiple ways to accomplish this in PostgreSQL, but perhaps the most elegant is a user-defined aggregate.

In a nutshell, a user defined aggregate is a function that has an internal value (state) that is updated for each input value that is being passed into it using a state transition function. This means it fits our FSM model like a glove:

CREATE AGGREGATE order_events_fsm(text) (
  SFUNC = order_events_transition,
  STYPE = text,
  INITCOND = 'start'

The above defines a new aggregate called order_events_fsm which takes a text input (one of our events) and calls the order_events_transition state transition function (FSUNC) for each input along with the current state (STYPE) which is also of type text. The initial state is start (INITCOND).

A quick test shows that it works as expected:

SELECT order_events_fsm(event ORDER BY id)
  (1, 'create'),
  (2, 'pay'),
  (3, 'cancel')
) examples(id, event);

Now let’s use our order_events_fsm to create a BEFORE INSERT trigger for our order_events table that makes sure all events of a given order_id are valid and don’t lead to an error state. We do this using a simple plpgsql function that executes our order_events_fsm against all existing events of the current order_id plus the new event. If the final state is error, we raise an exception which causes the current transaction to roll back. It looks like this:

CREATE FUNCTION order_events_tigger_func() RETURNS trigger
LANGUAGE plpgsql AS $$
  new_state text;
  SELECT order_events_fsm(event ORDER BY id)
  FROM (
    SELECT id, event FROM order_events WHERE order_id = new.order_id
    SELECT, new.event
  ) s
  INTO new_state;
  IF new_state = 'error' THEN
    RAISE EXCEPTION 'invalid event';

  RETURN new;

CREATE TRIGGER order_events_trigger BEFORE INSERT ON order_events
FOR EACH ROW EXECUTE PROCEDURE order_events_tigger_func();

We can verify that this works by inserting a valid sequence of order events:

INSERT INTO order_events (order_id, event) VALUES
  (1, 'create'),
  (1, 'pay'),
  (1, 'ship');

As well as an invalid sequence of events:

INSERT INTO order_events (order_id, event) VALUES
  (2, 'create'),
  (2, 'ship');
psql:orders.sql:95: ERROR:  invalid event
CONTEXT:  PL/pgSQL function order_events_tigger_func() line 14 at RAISE

As expected, only the first series of inserts made it into our table:

SELECT id, order_id, event FROM order_events;
 id | order_id | event
  1 |        1 | create
  2 |        1 | pay
  3 |        1 | ship

If you’re still on the fence about embedding this kind of logic into your database, let’s see how our approach gives us advanced analytical powers as a free by-product. Let’s consider a new data set of 3 orders:

TRUNCATE order_events;
INSERT INTO order_events (order_id, event, time) VALUES
  (1, 'create', '2017-07-23 00:00:00'),
  (1, 'pay', '2017-07-23 12:00:00'),
  (1, 'ship', '2017-07-24 00:00:00'),

  (2, 'create', '2017-07-23 00:00:00'),
  (2, 'cancel', '2017-07-24 00:00:00'),

  (3, 'create', '2017-07-23 00:00:00'),
  (3, 'pay', '2017-07-24 00:00:00'),
  (3, 'cancel', '2017-07-25 00:00:00'),
  (3, 'refund', '2017-07-26 00:00:00');

Using our order_events_fsm as a window function, we can easily get the state history of a given order:

SELECT time, order_events_fsm(event) OVER (ORDER BY id)
FROM order_events
WHERE order_id = 3;
        time         | order_events_fsm
 2017-07-23 00:00:00 | awaiting_payment
 2017-07-24 00:00:00 | awaiting_shipment
 2017-07-25 00:00:00 | awaiting_refund
 2017-07-26 00:00:00 | canceled

But we can go even further and apply our state machines to multiple orders, e.g. by using the generate_series function and a Lateral sub-query to break down the number of orders per state for each day of a given date range:

SELECT date::date, state, count(1)
  generate_series('2017-07-23'::date, '2017-07-26', '1 day') date,
    SELECT order_id, order_events_fsm(event ORDER BY id) AS state
    FROM order_events
    WHERE time < date + '1 day'::interval
    GROUP BY 1
  ) orders
ORDER BY 1, 2;
    date    |       state       | count
 2017-07-23 | awaiting_payment  |     2
 2017-07-23 | awaiting_shipment |     1
 2017-07-24 | awaiting_shipment |     1
 2017-07-24 | canceled          |     1
 2017-07-24 | shipped           |     1
 2017-07-25 | awaiting_refund   |     1
 2017-07-25 | canceled          |     1
 2017-07-25 | shipped           |     1
 2017-07-26 | canceled          |     2
 2017-07-26 | shipped           |     1

There you have it, a FSM implemented as a user defined aggregate in PostgreSQL providing both data integrity and advanced analytics.

That being said, your milage may vary, and embedding your business logic into your database is always a tradeoff. But if you want some reassurance: I’ve had great success in applying this approach in combination with eager materialization to implement a realtime analytics dashboard for an application with over a billion event rows.

Anyway, I’m really looking forward to feedback on this, and am more than happy to answer any questions, so please comment.

Join me at Apple: If the article above has you excited, come and join my team at Apple. We’re hiring Go and PostgreSQL developers in Shanghai, China right now. Relocation is possible, just send me an e-mail to find out more about this role.

Thanks to Thorsten Ball and Johannes Boyne for reviewing.

  1. The code in this article is immune to concurrency anomalies when using the SERIALIZABLE transaction isolation level. Alternative you could modify the trigger to aquire an exclusive lock on the order_events table. But as mentioned, this topic deserves a separate post. ↩︎

-- Felix Geisendörfer
Subscribe to this blog via RSS or E-Mail or get small updates from me via Twitter.