Random selection utilities used to be common in interviews. Less so in Python circles because of the builtin random module. Still advanced examples may come up. First is a generalization of shuffle and sample.
import itertoolsimport randomdef shuffled(iterable):"""Generate values in random order for any iterable. Faster than `random.shuffle` if not all values are required. More flexible than `random.sample` if the desired number is unknown a priori. """ values =list(iterable)while values: index = random.randrange(0, len(values)) values[index], values[-1] = values[-1], values[index]yield values.pop()list(itertools.islice(shuffled(range(10)), 5))
[9, 7, 5, 4, 6]
Next up is a random sample in a single pass, e.g., if the data is being read from a large file. The solution requires mathematical induction:
each Nth element has a fair chance of being selected
each previously selected element has a fair chance of being removed
def sample(iterable, k):"""Return a random sample from any iterable in a single pass. More memory efficient than `random.sample`. """ it =iter(iterable) selection =list(itertools.islice(it, k))# error handling and shuffling are consistent with random.sampleifnot0<= k <=len(selection):raiseValueError("sample larger than population") random.shuffle(selection)for count, value inenumerate(it, k +1): index = random.randrange(0, count)if index <len(selection): selection[index] = valuereturn selectionsample(iter(range(10)), 5)