GraphQL coroutines

GraphQL resolvers should have been coroutines.

graphql
Author

A. Coady

Published

February 20, 2023

GraphQL resolvers should have been coroutines.

This is how the GraphQL documentation introduces execution, as a hierarchy of resolvers:

You can think of each field in a GraphQL query as a function or method of the previous type which returns the next type. In fact, this is exactly how GraphQL works. Each field on each type is backed by a function called the resolver which is provided by the GraphQL server developer. When a field is executed, the corresponding resolver is called to produce the next value.

If a field produces a scalar value like a string or number, then the execution completes. However if a field produces an object value then the query will contain another selection of fields which apply to that object. This continues until scalar values are reached. GraphQL queries always end at scalar values.

There is a subtlety in the summary which is left as an exercise to the reader. The previous type must prepare the next types correctly such that their resolvers succeed. Specifically, all the trivial fields of the next type must be populated, not only because they may be requested, but because the next type’s resolvers invariably rely on that data.

It is as if parent types have a pre or start hook to setup child types. When framed that way, it becomes obvious that there is no post or end hook for a parent to finalize its result. Consider how unusual that is:

  • test frameworks have fixture setUp and tearDown
  • web frameworks have hooks around both sides of a stage in a request flow
  • inherited methods which support super allow code before and after the super call

Whereas GraphQL resolvers can only provide context to child fields, with no visibility into the result. It would be like only allowing super as the last line in a method.

Example

The effect is an entire class of common problems which should be trivial. Typically - but not limited to - lists of objects which have since been deleted or for which the user is not authorized. The problem can be seen in the best practices example on authorization.

Authorization is a type of business logic that describes whether a given user/session/context has permission to perform an action or see a piece of data. For example:

“Only authors can see their drafts”

//Authorization logic lives inside postRepository
var postRepository = require('postRepository');

var postType = new GraphQLObjectType({
  name: ‘Post’,
  fields: {
    body: {
      type: GraphQLString,
      resolve: (post, args, context, { rootValue }) => {
        return postRepository.getBody(context.user, post);
      }
    }
  }
});

The incomplete example does not implement “only authors can see their drafts”. It implements “only authors can see the body field of a draft”. But the user would still know the post exists and see all metadata which is not protected. “Drafts” being plural and all, a parent field with a list of posts is missing from the example. Surely the preferred solution would be to not return the unauthorized post in the list at all.

The same problem exists if the post should be hidden for any reason, including deletion. It cannot be overstated how common a problem this is in real world APIs, and GraphQL offers no solution.

Well, as constructed. The workaround is to abandon the premise of the example and push the authorization logic up to the posts field. But even that fails to address race conditions, which are particularly relevant for deletions.

Interestingly, it would also be more efficient if the query which determined the list of posts were “correct” in the first place. This relates to a previous article GraphQL is the new ORM, which focused on performance. Ultimately this is the same issue: the elegance of single-purpose resolvers fails when context has been lost, and that failure is especially common and noticeable when lists are involved.

Solution

A general solution would be to allow resolvers to be coroutines, or equivalently to allow a finalize resolver. Here is an example implemented in graphql-core:

--- a/src/graphql/execution/execute.py
+++ b/src/graphql/execution/execute.py
@@ -546,6 +546,8 @@ class ExecutionContext:
             completed = self.complete_value(
                 return_type, field_nodes, info, path, result
             )
+            if field_def.finalize is not None:
+                completed = field_def.finalize(completed, info, *args)
             if self.is_awaitable(completed):
                 # noinspection PyShadowingNames
                 async def await_completed() -> Any:
diff --git a/src/graphql/type/definition.py b/src/graphql/type/definition.py

--- a/src/graphql/type/definition.py
+++ b/src/graphql/type/definition.py
@@ -471,6 +471,7 @@ class GraphQLField:
     deprecation_reason: Optional[str]
     extensions: Dict[str, Any]
     ast_node: Optional[FieldDefinitionNode]
+    finalize: Optional[GraphQLFieldResolver]
 
     def __init__(
         self,
@@ -482,6 +483,7 @@ class GraphQLField:
         deprecation_reason: Optional[str] = None,
         extensions: Optional[Dict[str, Any]] = None,
         ast_node: Optional[FieldDefinitionNode] = None,
+        finalize: Optional[GraphQLFieldResolver] = None,
     ) -> None:
         if args:
             args = {
@@ -500,6 +502,7 @@ class GraphQLField:
         self.deprecation_reason = deprecation_reason
         self.extensions = extensions or {}
         self.ast_node = ast_node
+        self.finalize = finalize
 
     def __repr__(self) -> str:
         return f"<{self.__class__.__name__} {self.type!r}>"

With that minor extension, fields can supply a finalize resolver in addition to the usual one.

from dataclasses import dataclass
from graphql import (
    GraphQLField,
    GraphQLInt,
    GraphQLList,
    GraphQLNonNull,
    GraphQLObjectType,
    GraphQLSchema,
    GraphQLString,
    graphql_sync,
    print_schema,
)

post_data = {1: "first"}


@dataclass
class Post:
    id: int

    def body(self, info) -> str:
        return post_data.get(self.id)


postType = GraphQLObjectType(
    name="Post",
    fields={
        "id": GraphQLNonNull(GraphQLInt),
        "body": GraphQLField(GraphQLString, resolve=Post.body),
    },
)

postsField = GraphQLField(
    GraphQLNonNull(GraphQLList(GraphQLNonNull(postType))),
    resolve=lambda *_: [Post(1), Post(2)],
    finalize=lambda objs, _: [obj for obj in objs if obj["body"] is not None],
)

schema = GraphQLSchema(
    query=GraphQLObjectType(name="Query", fields={"posts": postsField})
)
print(print_schema(schema))
type Query {
  posts: [Post!]!
}

type Post {
  id: Int!
  body: String
}
source = "{ posts { id body } }"
graphql_sync(schema, source)
ExecutionResult(data={'posts': [{'id': 1, 'body': 'first'}]}, errors=None)

Addendum

In Python, this could be implemented as a generator-based coroutine. Similar to contextlib.contextmanager or pytest fixtures, the generator would yield types, receive the result maps, and yield one more final result.

def resolve(*_):
    objs = yield [Post(1), Post(2)]
    yield [obj for obj in objs if obj["body"] is not None]

But there are some obstacles to that approach. The result data is in map (dict) form, not the domain types, so the code is not necessarily more readable. Additionally, generators can already be used to implement list types (whether intentional or not); graphql-core simply iterates the result. There would need to be another mechanism to distinguish a true coroutine from a regular generator.

Speaking of domain types, notice how convenient and readable using a dataclass is, and how redundant the GraphQL type definition is. In strawberry-graphql, the schema is automatically derived from the domain types. The example would be simply:

@strawberry.type
class Post:
    id: int

    @strawberry.field
    def body(self) -> str:
        return post_data.get(self.id)