Dataset
Dataset(source: ibis.expr.types.relations.Table | pyarrow._dataset.Dataset)
Usage
Dataset()Attributes
| Name | Description |
|---|---|
| table | source as ibis table |
table
source as ibis table
table: ibis.Table
Methods
| Name | Description |
|---|---|
| any() |
Whether there are at least limit rows.
|
| asof_join() | As-of join on nearest key rather than equal keys. |
| cast() | Cast the columns of a table. |
| column() | Column of any type by name. |
| columns() | Fields for each column. |
| cross_join() | Cross join with one or more tables. |
| difference() | Set difference of tables. |
| distinct() | Remove duplicate rows from table. |
| drop_null() | Drop rows with null values. |
| fill_null() | Fill null values. |
| filter() | Filter rows by predicates. |
| first() | Provisionally sort and filter by rank. |
| group() | Group table by columns. |
| intersect() | Set intersection of tables. |
| join() | Join two tables. |
| optional() | Nullable field to stop error propagation, enabling partial query results. |
| order() | Sort table by columns. |
| project() | Mutate columns by expressions. |
| resolve() | Cache the table if it will be reused. |
| resolve_reference() | Return table filtered by federated keys. |
| row() | Scalar values at index. |
| runs() | Provisionally group table by adjacent values in columns. |
| select() | Return minimal schema needed to continue. |
| slice() | Limit row selection. |
| take() | Take rows by index. |
| to_sql() | Compile to a formatted SQL string. |
| union() | Set union of tables. |
| unnest() | Unnest an array column from a table. |
| unpack() | Unpack the struct fields of each column. |
any()
Whether there are at least limit rows.
Usage
any(limit=BigInt(1))May be significantly faster than count for out-of-core data.
asof_join()
As-of join on nearest key rather than equal keys.
Usage
asof_join(
info,
right,
on,
keys=[],
rkeys=[],
tolerance=None,
scalar={},
lname="",
rname="{name}_right"
)cast()
Cast the columns of a table.
Usage
cast(info, schema, try_=False)column()
Column of any type by name.
Usage
column(name, cast="", try_=False, index=[])If the column is in the schema, columns can be used instead.
columns()
Fields for each column.
Usage
columns(info)cross_join()
Cross join with one or more tables.
Usage
cross_join(info, right, lname="", rname="{name}_right")difference()
Set difference of tables.
Usage
difference(info, table, distinct=True)distinct()
Remove duplicate rows from table.
Usage
distinct(info, on=None, keep="first", counts="", order="")Differs from group by keeping all columns, and defaulting to all keys.
drop_null()
Drop rows with null values.
Usage
drop_null(info, subset=None, how="any")fill_null()
Fill null values.
Usage
fill_null(info, name=None, value=UNSET, scalar={})filter()
Filter rows by predicates.
Usage
filter(info, where=None, **queries)Schema derived fields provide syntax for simple queries; where supports complex queries.
first()
Provisionally sort and filter by rank.
Usage
first(info, by, rank=1, dense=False)group()
Group table by columns.
Usage
group(info, by=[], counts="", order="", aggregate={})intersect()
Set intersection of tables.
Usage
intersect(info, table, distinct=True)join()
Join two tables.
Usage
join(info, right, keys, rkeys=[], how="inner", lname="", rname="{name}_right")optional()
Nullable field to stop error propagation, enabling partial query results.
Usage
optional(info)order()
Sort table by columns.
Usage
order(info, by, limit=None, dense=False)project()
Mutate columns by expressions.
Usage
project(info, columns)Renamed to not be confused with a mutation.
resolve()
Cache the table if it will be reused.
Usage
resolve(info, source)resolve_reference()
Return table filtered by federated keys.
Usage
resolve_reference(info, **keys)row()
Scalar values at index.
Usage
row(info, index=0)runs()
Provisionally group table by adjacent values in columns.
Usage
runs(info, by=[], split=[], counts="", order="_", aggregate={})select()
Return minimal schema needed to continue.
Usage
select(info, source)slice()
Limit row selection.
Usage
slice(info, offset=BigInt(0), limit=None)take()
Take rows by index.
Usage
take(info, indices)to_sql()
Compile to a formatted SQL string.
Usage
to_sql(dialect=None, pretty=True)union()
Set union of tables.
Usage
union(info, table, distinct=False)unnest()
Unnest an array column from a table.
Usage
unnest(info, name, offset="", keep_empty=False, order="")unpack()
Unpack the struct fields of each column.
Usage
unpack(info, names)