Random Selection

Random selection utilities used to be common in interviews. Less so in Python circles because of the builtin random module. Still advanced examples may come up. First is a generalization of shuffle and sample.

In [1]:
import itertools
import random

def shuffled(iterable):
    """Generate values in random order for any iterable.
    
    Faster than `random.shuffle` if not all values are required.
    More flexible than `random.sample` if the desired number is unknown a priori.
    """
    values = list(iterable)
    while values:
        index = random.randrange(0, len(values))
        values[index], values[-1] = values[-1], values[index]
        yield values.pop()

list(itertools.islice(shuffled(range(10)), 5))
Out[1]:
[9, 7, 5, 4, 6]

Next up is a random sample in a single pass, e.g., if the data is being read from a large file. The solution requires mathematical induction:

  • each Nth element has a fair chance of being selected
  • each previously selected element has a fair chance of being removed
In [2]:
def sample(iterable, k):
    """Return a random sample from any iterable in a single pass.
    
    More memory efficient than `random.sample`.
    """
    it = iter(iterable)
    selection = list(itertools.islice(it, k))
    # error handling and shuffling are consistent with random.sample
    if not 0 <= k <= len(selection):
        raise ValueError("sample larger than population")
    random.shuffle(selection)
    for count, value in enumerate(it, k + 1):
        index = random.randrange(0, count)
        if index < len(selection):
            selection[index] = value
    return selection

sample(iter(range(10)), 5)
Out[2]:
[1, 2, 6, 3, 8]