Random Selection

Generate random numbers efficiently.

interviews
Author

A. Coady

Published

January 2, 2023

Random selection utilities used to be common in interviews. Less so in Python circles because of the builtin random module. Still advanced examples may come up. First is a generalization of shuffle and sample.

import itertools
import random

def shuffled(iterable):
    """Generate values in random order for any iterable.
    
    Faster than `random.shuffle` if not all values are required.
    More flexible than `random.sample` if the desired number is unknown a priori.
    """
    values = list(iterable)
    while values:
        index = random.randrange(0, len(values))
        values[index], values[-1] = values[-1], values[index]
        yield values.pop()

list(itertools.islice(shuffled(range(10)), 5))
[9, 7, 5, 4, 6]

Next up is a random sample in a single pass, e.g., if the data is being read from a large file. The solution requires mathematical induction:

def sample(iterable, k):
    """Return a random sample from any iterable in a single pass.
    
    More memory efficient than `random.sample`.
    """
    it = iter(iterable)
    selection = list(itertools.islice(it, k))
    # error handling and shuffling are consistent with random.sample
    if not 0 <= k <= len(selection):
        raise ValueError("sample larger than population")
    random.shuffle(selection)
    for count, value in enumerate(it, k + 1):
        index = random.randrange(0, count)
        if index < len(selection):
            selection[index] = value
    return selection

sample(iter(range(10)), 5)
[1, 2, 6, 3, 8]