# Random Selection

Random selection utilities used to be common in interviews. Less so in Python circles because of the builtin random module. Still advanced examples may come up. First is a generalization of shuffle and sample.

In [1]:
import itertools
import random

def shuffled(iterable):
"""Generate values in random order for any iterable.

Faster than random.shuffle if not all values are required.
More flexible than random.sample if the desired number is unknown a priori.
"""
values = list(iterable)
while values:
index = random.randrange(0, len(values))
values[index], values[-1] = values[-1], values[index]
yield values.pop()

list(itertools.islice(shuffled(range(10)), 5))

Out[1]:
[9, 7, 5, 4, 6]

Next up is a random sample in a single pass, e.g., if the data is being read from a large file. The solution requires mathematical induction:

• each Nth element has a fair chance of being selected
• each previously selected element has a fair chance of being removed
In [2]:
def sample(iterable, k):
"""Return a random sample from any iterable in a single pass.

More memory efficient than random.sample.
"""
it = iter(iterable)
selection = list(itertools.islice(it, k))
# error handling and shuffling are consistent with random.sample
if not 0 <= k <= len(selection):
raise ValueError("sample larger than population")
random.shuffle(selection)
for count, value in enumerate(it, k + 1):
index = random.randrange(0, count)
if index < len(selection):
selection[index] = value
return selection

sample(iter(range(10)), 5)

Out[2]:
[1, 2, 6, 3, 8]