# Primes

Generate prime numbers.

interviews
Author

Published

July 10, 2021

An old interview challenge is to generate prime numbers or check if a number is prime. No advanced mathematics needed, just variants on the Sieve of Eratosthenes. Starting with a basic prime checker.

``````def isprime(n):
divs = range(2, int(n ** 0.5) + 1)
return all(n % d for d in divs)

%time isprime(1_000_003)``````
``````CPU times: user 83 µs, sys: 1e+03 ns, total: 84 µs
Wall time: 85.1 µs``````
``True``

A common optimization is to skip even numbers.

``````def isprime(n):
divs = range(3, int(n ** 0.5) + 1, 2)
return n == 2 or all(n % d for d in divs)

%time isprime(1_000_003)``````
``````CPU times: user 43 µs, sys: 0 ns, total: 43 µs
Wall time: 46.3 µs``````
``True``

Brief digression on that optimization. There’s nothing special about removing multiples of 2; removing multiples is the whole point. The `step` scalar could instead be thought of as a cycle: `itertools.accumulate(itertools.repeat(2))`. So removing multiples of 3 would remove every third step: `itertools.accumulate(itertools.cycle([2, 4]))`.

Or the equivalent could be done with slicing.

``````import itertools

def isprime(n):
divs = range(5, int(n ** 0.5) + 1, 2)
return n in (2, 3) or all(n % d for d in itertools.chain(divs[::3], divs[1::3]))

%time isprime(1_000_003)``````
``````CPU times: user 42 µs, sys: 1 µs, total: 43 µs
Wall time: 44.1 µs``````
``True``

The catch is the cycles grow exponentially with diminishing returns on each successive number.

Onto prime generation, while keeping the odds-only optimization. Typically it’s requested to generate the first `N` primes, or up to some value. But that’s easily generalized with `itertools.islice` and `itertools.takewhile`. A more Pythonic approach is an unbounded generator.

``````def primes():
yield 2
ints = itertools.count(3, 2)
while True:
prime = next(ints)
yield prime
ints = (n for n in ints if n % prime)

list(itertools.islice(primes(), 10))``````
``[2, 3, 5, 7, 9, 11, 13, 15, 17, 19]``

Elegant, but doesn’t work. The problem is the scoping of `prime`, which is being used in the generator expression but also modified in the loop. Instead it can be replaced with a `filter` on a partially bound function, but unfortunately `functools.partial` only binds left arguments and `rmod` is needed here. One alternative is to use bound methods as a first-class function, even dunder methods.

``````def primes():
yield 2
ints = itertools.count(3, 2)
while True:
prime = next(ints)
yield prime
ints = filter(prime.__rmod__, ints)

%time next(itertools.islice(primes(), 1000, None))``````
``````CPU times: user 30.7 ms, sys: 1.82 ms, total: 32.5 ms
Wall time: 32 ms``````
``7927``

Elegant, but slow and could overflow the stack. A more traditional approach would use the same checking logic as `isprime`, but also cache the primes so as to not duplicate divisors.

``````def primes():
yield 2
primes = []
for n in itertools.count(3, 2):
if all(n % p for p in itertools.takewhile(int(n ** 0.5).__ge__, primes)):
primes.append(n)
yield n

%time next(itertools.islice(primes(), 1000, None))``````
``````CPU times: user 5.49 ms, sys: 423 µs, total: 5.92 ms
Wall time: 5.8 ms``````
``7927``

Onto interface design. The primes are being stored anyway, so it would be nice if they were re-iterable. A generator can be written as a class with `__iter__` and `__next__`, but an under-appreciated feature is that `__iter__` itself can be a generator. And now that it’s a class, `isprime` can be expressed as `in` while also benefiting from the cache.

``````class Primes:
def __init__(self):
self.ints = itertools.count(3, 2)
self.cache = 

def __iter__(self):
yield from self.cache
for n in self.ints:
if n in self:
self.cache.append(n)
yield n

def __contains__(self, n):
return all(n % p for p in itertools.takewhile(int(n ** 0.5).__ge__, self))

primes = Primes()
%time next(itertools.islice(primes, 1000, None))``````
``````CPU times: user 7.89 ms, sys: 483 µs, total: 8.37 ms
Wall time: 8 ms``````
``7927``
``%time 1_000_003 in primes``
``````CPU times: user 34 µs, sys: 0 ns, total: 34 µs
Wall time: 37 µs``````
``True``

There’s a hybrid approach though, that’s faster and nearly as simple as the above sieves. Instead of doing repeated divisions, keep track of each found prime along with the next multiple that it would eliminate. The inner loop is then optimized because it only needs to account for collisions.

``````def primes():
multiples = {}
for n in itertools.count(2):
prime = multiples.pop(n, 0)
if not prime:
prime = n
yield n
key = n + prime
while key in multiples:
key += prime
multiples[key] = prime

%time next(itertools.islice(primes(), 1000, None))``````
``````CPU times: user 2.59 ms, sys: 103 µs, total: 2.69 ms
Wall time: 2.7 ms``````
``7927``

Now to add back the odds-only optimization, the step scalar needs to be double the prime number. Another way to reduce collisions is to recognize that each new prime is irrelevant until its square value is reached.

``````def primes():
yield 2
multiples = {}
for n in itertools.count(3, 2):
step = multiples.pop(n, 0)
if step:  # composite
key = n + step
while key in multiples:
key += step
multiples[key] = step
else:  # prime
multiples[n ** 2] = n * 2
yield n

%time next(itertools.islice(primes(), 1000, None))``````
``````CPU times: user 1.37 ms, sys: 5 µs, total: 1.38 ms
Wall time: 1.38 ms``````
``7927``

And finally let’s add back the caching. Yielding a clean interface, an efficient implementation for all use cases, and still relatively simple.

``````class Primes:
def __init__(self):
self.ints = itertools.count(3, 2)
self.cache = 
self.multiples = {}

def __iter__(self):
yield from self.cache
for n in self.ints:
step = self.multiples.pop(n, 0)
if step:  # composite
key = n + step
while key in self.multiples:
key += step
self.multiples[key] = step
else:  # prime
self.multiples[n ** 2] = n * 2
self.cache.append(n)
yield n

def __contains__(self, n):
return all(n % p for p in itertools.takewhile(int(n ** 0.5).__ge__, self))

primes = Primes()
%time 1_000_003 in primes``````
``````CPU times: user 242 µs, sys: 0 ns, total: 242 µs
Wall time: 245 µs``````
``True``
``%time 1_000_003 in primes``
``````CPU times: user 40 µs, sys: 0 ns, total: 40 µs
Wall time: 43.2 µs``````
``True``