Random Selection
Random selection utilities used to be common in interviews. Less so in Python circles because of the builtin random
module. Still advanced examples may come up. First is a generalization of shuffle
and sample
.
In [1]:
import itertools
import random
def shuffled(iterable):
"""Generate values in random order for any iterable.
Faster than `random.shuffle` if not all values are required.
More flexible than `random.sample` if the desired number is unknown a priori.
"""
values = list(iterable)
while values:
index = random.randrange(0, len(values))
values[index], values[-1] = values[-1], values[index]
yield values.pop()
list(itertools.islice(shuffled(range(10)), 5))
Out[1]:
Next up is a random sample in a single pass, e.g., if the data is being read from a large file. The solution requires mathematical induction:
- each Nth element has a fair chance of being selected
- each previously selected element has a fair chance of being removed
In [2]:
def sample(iterable, k):
"""Return a random sample from any iterable in a single pass.
More memory efficient than `random.sample`.
"""
it = iter(iterable)
selection = list(itertools.islice(it, k))
# error handling and shuffling are consistent with random.sample
if not 0 <= k <= len(selection):
raise ValueError("sample larger than population")
random.shuffle(selection)
for count, value in enumerate(it, k + 1):
index = random.randrange(0, count)
if index < len(selection):
selection[index] = value
return selection
sample(iter(range(10)), 5)
Out[2]: