GraphQL cursors

Contrarian view on cursor-based pagination.

graphql
Author

A. Coady

Published

June 9, 2024

Contrarian view on cursor-based pagination.

GraphQL documentation recommends cursor-based pagination, and it has subsequently become a popular standard.

In general, we’ve found that cursor-based pagination is the most powerful of those designed. Especially if the cursors are opaque, either offset or ID-based pagination can be implemented using cursor-based pagination (by making the cursor the offset or the ID), and using cursors gives additional flexibility if the pagination model changes in the future. As a reminder that the cursors are opaque and that their format should not be relied upon, we suggest base64 encoding them. …

{
  hero {
    name
    friends(first: 2) {
      totalCount
      edges {
        node {
          name
        }
        cursor
      }
      pageInfo {
        endCursor
        hasNextPage
      }
    }
  }
}

There are several oversights with this well-intentioned advice.

Cursors and state

Cursors imply state, at least they used to. A database cursor is used for iterating over a result set. Meaning it has transactional integrity to pick up where it left off.

The vast majority of GraphQL APIs are inherently stateless. The “cursor” is being decoded as input to a new request, and offers no guarantees. From this observation, the advice falls apart.

The problem with stateless pagination is inconsistency; items may shift, appear, or disappear. Which gives the client the perception of missing or duplicate items. This happens regardless of whether the pagination is offset or ID based. Arguably worse in the case of IDs, since the reference can move arbitrarily or be gone.

Cursors don’t solve the consistency problem; they give the client the false impression of solving the problem.

Opaqueness and compatibility

The claim is that an opaque cursor is compatible across changes. Changed to do what exactly, would be the more relevant question.

Taking a step back, what is the problem being solved here? We assume there is a list of items, with an inherent ordering, and too many to return to the client with acceptable performance.

Given those assumptions, the first obvious step is an optional size limit. That is not in dispute; the disagreement if over the “offset”. A simple and versatile solution is a range filter over whatever field(s) is relevant to ordering. This is not even remotely controversial when the field in question has a name like date. In other words, “pagination” is not necessarily the problem that needs solving.

Range filters with a size limit are sufficient to implement pagination, and new optional filters are always backwards compatible. They also offer the flexibility of search, whereas cursors can only be used iteratively. And what if the client does not want visibility into the range filters? That is exactly what offset is for; offset is a range filter over an implied index field.

There is a reason why the recommendation does not offer a useful example of this supposed compatibility; there isn’t one. The advice is equivocating on the ambiguity of an after: $ID filter. Is the ID field relevant to the ordering?

  • If yes, then it is just another range filter
  • If no, then it is just another placeholder for index

There is no third case. There is no future secret field that relates to ordering, is relevant to the client, but somehow still opaque to the client.

Stateless pagination is a combination of range filters and size limits. No matter what the input fields are called. A true stateful is cursor is opaque precisely because it does not represent any known field.

Next optimization

The “next” piece of advice is that the cursor implementation should indicate whether another request is worthwhile. Again, in a stateless API, the server can make no such guarantee.

If the server can provide a total count, by all means do so. It solves the “next” problem, and is more generally useful.

If it is not feasible for the server to provide a total count, how is it going to implement whether there are more items? At the data layer, it is going to stop processing at N + 1 items instead of the requested N. The client could do that too. Instead of requesting the next 10, it could go to 11.

Better yet, why stop at the server optimizing for N + 0? If it knows there is just 1 more item, why not go ahead and include that last one too. N + 2 anyone? Obsessing over the last “next” is a pointless micro-optimization, all the more so because it is irrelevant whenever the total count is not coincidentally a multiple of N. If N is arbitrary, then optimizing for a particular residue mod N is clearly arbitrary.

API design

Not only is there no good reason to blindly add opaque cursors, there is also no reason to add range filters before needed. A size limit alone solves the first order of magnitude of performance issues. If a client requests the first 10 items, then needs the next 10, actually pressure test whether it is unreasonable to request the first 20. The advantage is the client then has a consistent snapshot of the first 20 regardless of changes, which could provide a better user experience.

A simple strategy for pagination: start with none. Then proceed to next steps as performance warrants.

  1. size limit
  2. range filter on known field(s)
  3. offset

In the unlikely event your API is stateful, you didn’t need this advice because you already had a cursor. Otherwise, cursors are an overly-complicated useless abstraction.