A common task and interview question, with many variants. It’s frequently asked and answered in a way that’s suboptimal and only handles one specific case. The goal here is to present definitive, general, and efficient solutions.
The first variant is whether or not the chunks will overlap. Although this could be generalized into a step parameter, it’s nearly always the case that step in (1, size).
The second variant is whether the tail of the data should be returned, if it’s not of equal size. Clearly when step == 1 that’s unlikely to be desirable. However, when step == size, that would seem to be the natural choice. And again when 1 < step < size, the desired behavior isn’t clear at all.
The third variant is whether to slice sequences, or support any iterable. Obviously working for any iterable would be ideal, but it’s also likely a user would expect slices given a sequence, particularly in the case of strings.
So this author feels it’s best to split the problem in 2 distinct cases: a sliding window for overlapping sequences, and chunks for discrete sequences. In each case, supporting iterables and using advanced iterator algebra for a minimal and efficient solution.
Window
import itertoolsdef window(iterable, size=2):"""Generate a sliding window of values.""" its = itertools.tee(iterable, size)returnzip(*(itertools.islice(it, index, None) for index, it inenumerate(its)))list(window('abcde'))
[('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e')]
That’s simple, and close to optimal. There is slight overhead in iterating an islice object, so a minor variant would be to force the step-wise iteration in advance.
import collectionsdef window(iterable, size=2):"""Generate a sliding window of values.""" its = itertools.tee(iterable, size)for index, it inenumerate(its): collections.deque(itertools.islice(it, index), 0) # exhaust iteratorreturnzip(*its)list(window('abcde'))
[('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e')]
Chunks
A lesser-known and under-utilized feature of iter is that in can take a callable (of no arguments) and a sentinel to create an iterator. A perfect use case of the “loop and a half” idiom.
def chunks(iterable, size):"""Generate adjacent chunks of data""" it =iter(iterable)returniter(lambda: tuple(itertools.islice(it, size)), ())list(chunks('abcde', 3))
[('a', 'b', 'c'), ('d', 'e')]
This should also be optimal, for reasonable sizes.
Sequences with dispatch
Rather than explicitly check isinstance, this is a perfect use case for functools.singledispatch.