Today we’re going to look at generating random numbers as a product of its prime factors.


Table of Contents


Prime factored numbers

I came across this one page gem in an algorithms class a while back and I thought it was so cool, especially because it generates numbers uniformly.

As you might know, simulating random numbers on a computer is not actually random - there is a deterministic algorithm under the hood.

Now, it is trivial to run np.random.randint() in Python to generate a number but in this case we want to know it’s prime factors as well.

One could generate a random number and try to prime factorise it, but it turns out that to be a NP-hard problem, i.e. no polynomial time algorithm is known.

However, there does exist an efficient algorithm to test whether a number is prime.

Take care to note the difference.

We cannot generate a number and then factorise it in a reasonable time, so instead we will generate numbers, test their primality and take product of the ones that are prime to achieve our goal.

Let’s look at the algorithm. It is deceptively simple.

The algorithm

Let N be the upper bound positive integer from which to sample random numbers siN+.

For each iteration:

  1. Generate a random sequence si,i=1,2,k starting with si{1,2,,N}, continuing with si+1{1,2,,si} until we sample sk=1.

  2. We take the product of primes s=sSGsi, where SGS is the subset of generated prime numbers.

  3. Output s with probability sN if sN.

Notice that Ns1s2sk=1.

Probabilistic justification

Each prime si can be chosen with probability 1si. This is because once you “miss” that number, you can never go back to it again.

Suppose you now filter out the chosen numbers to only include primes, p. Since we’re multiplying them together and their probabilities are independent, the probability of generating r is

1s=1p11p21pk

Since we are rejecting with sN it should be pretty clear why its uniform sampling - notice that we reject anything larger than N!

Time complexity

To understand the time complexity, we look at 2 components. First we need to test each si for primality. Then we need to know how often this algorithm is going to restart.

Suppose we somehow ended up choosing every number in the sequence, then the probability of choosing each number for the first time is 1si. Hence the number of primality tests needed is

O(logN)=1+12+13+...+1N

Now, we look at the expected number of restarts. The probability of accepting one of the numbers N is

P[accept|s]=(sN)(1s)sSG(11si)=1NsSG(11si)

Since we can accept any number from 1 to N, the probability of accepting is then sSG(11si)

It turns out that the expected number of rejections (and therefore restarts) is

E[# of rejections]=1sSG(11si)=O(logN)

by Merten’s 3rd Theorem.

Hence the running time is

O(log2Nprimality test)

Since we’re assuming that the primality test can be executed in polynomial time, we are done.


Aftermath

I stumbled across the Prime Number Theorem (PNT) which describes the distribution of prime numbers asymptotically as N approaches infinity.

I found this Khan academy video intuitive for understanding the PNT.

Now I’m curious as to how np.random.randint() is implemented =). That might a discussion for another post.