# Split an Iterable

Split an iterable into equal sized chunks.

interviews
Author

Published

December 28, 2019

# Split an iterable into equal sized chunks.

A common task and interview question, with many variants. It’s frequently asked and answered in a way that’s suboptimal and only handles one specific case. The goal here is to present definitive, general, and efficient solutions.

The first variant is whether or not the chunks will overlap. Although this could be generalized into a `step` parameter, it’s nearly always the case that `step in (1, size)`.

The second variant is whether the tail of the data should be returned, if it’s not of equal size. Clearly when `step == 1` that’s unlikely to be desirable. However, when `step == size`, that would seem to be the natural choice. And again when `1 < step < size`, the desired behavior isn’t clear at all.

The third variant is whether to slice sequences, or support any iterable. Obviously working for any iterable would be ideal, but it’s also likely a user would expect slices given a sequence, particularly in the case of strings.

So this author feels it’s best to split the problem in 2 distinct cases: a sliding `window` for overlapping sequences, and `chunks` for discrete sequences. In each case, supporting iterables and using advanced iterator algebra for a minimal and efficient solution.

## Window

``````import itertools

def window(iterable, size=2):
"""Generate a sliding window of values."""
its = itertools.tee(iterable, size)
return zip(*(itertools.islice(it, index, None) for index, it in enumerate(its)))

list(window('abcde'))``````
``[('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e')]``

That’s simple, and close to optimal. There is slight overhead in iterating an `islice` object, so a minor variant would be to force the step-wise iteration in advance.

``````import collections

def window(iterable, size=2):
"""Generate a sliding window of values."""
its = itertools.tee(iterable, size)
for index, it in enumerate(its):
collections.deque(itertools.islice(it, index), 0)  # exhaust iterator
return zip(*its)

list(window('abcde'))``````
``[('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e')]``

## Chunks

A lesser-known and under-utilized feature of `iter` is that in can take a callable (of no arguments) and a sentinel to create an iterator. A perfect use case of the “loop and a half” idiom.

``````def chunks(iterable, size):
it = iter(iterable)
return iter(lambda: tuple(itertools.islice(it, size)), ())

list(chunks('abcde', 3))``````
``[('a', 'b', 'c'), ('d', 'e')]``

This should also be optimal, for reasonable sizes.

## Sequences with dispatch

Rather than explicitly check `isinstance`, this is a perfect use case for `functools.singledispatch`.

``````import functools
from collections.abc import Sequence

window = functools.singledispatch(window)

@window.register
def _(seq: Sequence, size=2):
for index in range(len(seq) - size + 1):
yield seq[index:index + size]

list(window('abcde'))``````
``['ab', 'bc', 'cd', 'de']``
``list(window(iter('abcde')))``
``[('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e')]``
``````chunks = functools.singledispatch(chunks)

@chunks.register
def _(seq: Sequence, size):
for index in range(0, len(seq), size):
yield seq[index:index + size]

list(chunks('abcde', 3))``````
``['abc', 'de']``
``list(chunks(iter('abcde'), 3))``
``[('a', 'b', 'c'), ('d', 'e')]``