graphique | graphique

Links

AI / Agents

Developers

A. Coady

Version 2

When this project started, there was no out-of-core execution engine with performance comparable to PyArrow. So it effectively included one, based on datasets and Acero.

Since then the ecosystem has grown considerably: DuckDB, DataFusion, and Ibis. As of version 2, graphique is based on ibis. It provides a common dataframe API for multiple backends, enabling graphique to also have a default but configurable backend.

Being a major version upgrade, there are incompatible changes from version 1. However the overall API remains largely the same.

Usage

There is an example app which reads a parquet dataset.

env PARQUET_PATH=... uvicorn graphique.service:app

Open http://localhost:8000/ to try out the API in GraphiQL. There is a test fixture at ./tests/fixtures/zipcodes.parquet.

env PARQUET_PATH=... strawberry export-schema graphique.service:app.schema

outputs the graphql schema.

Configuration

The example app uses Starlette’s config: in environment variables or a .env file.

PARQUET_PATH: path to the parquet directory or file
NAME = ’’: GraphQL field on Query; defaults to root type
COLUMNS = None: list of names, or mapping of aliases, of columns to select

Configuration options exist to provide a convenient no-code solution, but are subject to change in the future. Using a custom app is recommended for production usage.

App

For more options create a custom ASGI app. Call graphique’s GraphQL on an ibis Table or arrow Dataset. Use a Query type with dataset attributes for multiple roots, and to enable federation.

import ibis
from graphique import GraphQL, typed

source = ibis.read_*(...)  # or `ibis.connect(...).table(...)` or `pyarrow.dataset.dataset(...)`
# apply initial projections or filters to `source`
app = GraphQL(source)  # Table is root query type

# multiple named fields, with optional federation keys
class Query:
    name = source  # or `typed(source, name, keys=...)`
app = GraphQL(Query)

Start like any ASGI app.

uvicorn <module>:app

API

types

Dataset: interface for an ibis table or arrow dataset.
Table: implements the Dataset interface. Adds typed row, columns, and filter fields from introspecting the schema.
Column: interface for an ibis column. Each data type has a corresponding column implementation: Boolean, Int, BigInt, Float, Decimal, Date, Datetime, Time, Duration, Base64, String, Array, Struct. All columns have a values field for their list of scalars. Additional fields vary by type.
Row: scalar fields. Tables are column-oriented, and graphique encourages that usage for performance. A single row field is provided for convenience, but a field for a list of rows is not. Requesting parallel columns is far more efficient.

selection

slice: contiguous selection of rows
filter: select rows by predicates
join, asofJoin, crossJoin: join tables by key columns
difference, intersect, union: set operations on tables
take: rows by index
dropNull: remove rows with nulls

projection

project: project columns with expressions
columns: provides a field for every Column in the schema
column: access a column of any type by name
row: provides a field for each scalar of a single row
cast: cast column types
unpack: project struct fields
fillNull: fill null values

aggregation

group: group by given columns, and aggregate the others
distinct: group with all columns
runs: provisionally group by adjacency
unnest: unnest an array column
count, any: number of rows

ordering

order: sort table by given columns
first: provisionally sort and filter by rank

reflection

type: type of data source
schema: field names and types
optional: nullable for errors
toSql: compiles SQL query

Performance

Performance is dependent on the ibis backend, which defaults to duckdb. There are no internal Python loops. Scalars do not become Python types until serialized. Table fields are lazily evaluated up until scalars are reached, and automatically cached as needed for multiple fields.

PyArrow is also used for partitioned dataset optimizations. python -m graphique.partition is a command-line script provided in graphique[cli], for out-of-core partitioning.

Installation

pip install graphique[server,cli]

Dependencies

ibis-framework (with duckdb or other backend)
strawberry-graphql[asgi,cli]
pyarrow
isodate
uvicorn (or other ASGI server)

Tests

100% branch coverage.

pytest [--cov]