Dataset

Dataset(source: ibis.expr.types.relations.Table | pyarrow._dataset.Dataset)

Usage

Source

Dataset()

Attributes

Name Description
table source as ibis table

table

source as ibis table

table: ibis.Table

Methods

Name Description
any() Whether there are at least limit rows.
asof_join() As-of join on nearest key rather than equal keys.
cast() Cast the columns of a table.
column() Column of any type by name.
columns() Fields for each column.
cross_join() Cross join with one or more tables.
difference() Set difference of tables.
distinct() Remove duplicate rows from table.
drop_null() Drop rows with null values.
fill_null() Fill null values.
filter() Filter rows by predicates.
first() Provisionally sort and filter by rank.
group() Group table by columns.
intersect() Set intersection of tables.
join() Join two tables.
optional() Nullable field to stop error propagation, enabling partial query results.
order() Sort table by columns.
project() Mutate columns by expressions.
resolve() Cache the table if it will be reused.
resolve_reference() Return table filtered by federated keys.
row() Scalar values at index.
runs() Provisionally group table by adjacent values in columns.
select() Return minimal schema needed to continue.
slice() Limit row selection.
take() Take rows by index.
to_sql() Compile to a formatted SQL string.
union() Set union of tables.
unnest() Unnest an array column from a table.
unpack() Unpack the struct fields of each column.

any()

Whether there are at least limit rows.

Usage

Source

any(limit=BigInt(1))

May be significantly faster than count for out-of-core data.


asof_join()

As-of join on nearest key rather than equal keys.

Usage

Source

asof_join(
    info,
    right,
    on,
    keys=[],
    rkeys=[],
    tolerance=None,
    scalar={},
    lname="",
    rname="{name}_right"
)

cast()

Cast the columns of a table.

Usage

Source

cast(info, schema, try_=False)

column()

Column of any type by name.

Usage

Source

column(name, cast="", try_=False, index=[])

If the column is in the schema, columns can be used instead.


columns()

Fields for each column.

Usage

Source

columns(info)

cross_join()

Cross join with one or more tables.

Usage

Source

cross_join(info, right, lname="", rname="{name}_right")

difference()

Set difference of tables.

Usage

Source

difference(info, table, distinct=True)

distinct()

Remove duplicate rows from table.

Usage

Source

distinct(info, on=None, keep="first", counts="", order="")

Differs from group by keeping all columns, and defaulting to all keys.


drop_null()

Drop rows with null values.

Usage

Source

drop_null(info, subset=None, how="any")

fill_null()

Fill null values.

Usage

Source

fill_null(info, name=None, value=UNSET, scalar={})

filter()

Filter rows by predicates.

Usage

Source

filter(info, where=None, **queries)

Schema derived fields provide syntax for simple queries; where supports complex queries.


first()

Provisionally sort and filter by rank.

Usage

Source

first(info, by, rank=1, dense=False)

group()

Group table by columns.

Usage

Source

group(info, by=[], counts="", order="", aggregate={})

intersect()

Set intersection of tables.

Usage

Source

intersect(info, table, distinct=True)

join()

Join two tables.

Usage

Source

join(info, right, keys, rkeys=[], how="inner", lname="", rname="{name}_right")

optional()

Nullable field to stop error propagation, enabling partial query results.

Usage

Source

optional(info)

order()

Sort table by columns.

Usage

Source

order(info, by, limit=None, dense=False)

project()

Mutate columns by expressions.

Usage

Source

project(info, columns)

Renamed to not be confused with a mutation.


resolve()

Cache the table if it will be reused.

Usage

Source

resolve(info, source)

resolve_reference()

Return table filtered by federated keys.

Usage

Source

resolve_reference(info, **keys)

row()

Scalar values at index.

Usage

Source

row(info, index=0)

runs()

Provisionally group table by adjacent values in columns.

Usage

Source

runs(info, by=[], split=[], counts="", order="_", aggregate={})

select()

Return minimal schema needed to continue.

Usage

Source

select(info, source)

slice()

Limit row selection.

Usage

Source

slice(info, offset=BigInt(0), limit=None)

take()

Take rows by index.

Usage

Source

take(info, indices)

to_sql()

Compile to a formatted SQL string.

Usage

Source

to_sql(dialect=None, pretty=True)

union()

Set union of tables.

Usage

Source

union(info, table, distinct=False)

unnest()

Unnest an array column from a table.

Usage

Source

unnest(info, name, offset="", keep_empty=False, order="")

unpack()

Unpack the struct fields of each column.

Usage

Source

unpack(info, names)