Today we’re going to look at generating random numbers as a product of its prime factors.
Table of Contents
Prime factored numbers
I came across this one page gem in an algorithms class a while back and I thought it was so cool, especially because it generates numbers uniformly.
As you might know, simulating random numbers on a computer is not actually random - there is a deterministic algorithm under the hood.
Now, it is trivial to run np.random.randint()
in Python to generate a number but in this case we want to know it’s prime factors as well.
One could generate a random number and try to prime factorise it, but it turns out that to be a NP-hard problem, i.e. no polynomial time algorithm is known.
However, there does exist an efficient algorithm to test whether a number is prime.
Take care to note the difference.
We cannot generate a number and then factorise it in a reasonable time, so instead we will generate numbers, test their primality and take product of the ones that are prime to achieve our goal.
Let’s look at the algorithm. It is deceptively simple.
The algorithm
Let $N$ be the upper bound positive integer from which to sample random numbers $s_{i} \in \mathbb{N^{+}}$.
For each iteration:
-
Generate a random sequence $s_{i}, i = 1, 2, … k$ starting with $s_{i} \in \left\{1, 2, …, N \right\}$, continuing with $s_{i+1} \in \left\{1, 2, …, s_{i} \right\}$ until we sample $s_{k} = 1$.
-
We take the product of primes $s^{*} = \prod_{s \in S_{G}}s_{i}$, where $S_{G} \subseteq S$ is the subset of generated prime numbers.
-
Output $s^{*}$ with probability $\frac{s^{*}}{N}$ if $s^{*} \leq N$.
Notice that $N \geq s_{1} \geq s_{2} \cdots \geq s_{k} = 1$.
Probabilistic justification
Each prime $s_{i}$ can be chosen with probability $\frac{1}{s_{i}}$. This is because once you “miss” that number, you can never go back to it again.
Suppose you now filter out the chosen numbers to only include primes, $p$. Since we’re multiplying them together and their probabilities are independent, the probability of generating $r$ is
\[\begin{align} \frac{1}{s^*} & = \frac{1}{p_{1}}\cdot\frac{1}{p_{2}}\cdot\dots\cdot\frac{1}{p_{k}} \end{align}\]Since we are rejecting with $\frac{s^{*}}{N}$ it should be pretty clear why its uniform sampling - notice that we reject anything larger than $N$!
Time complexity
To understand the time complexity, we look at 2 components. First we need to test each $s_{i}$ for primality. Then we need to know how often this algorithm is going to restart.
Suppose we somehow ended up choosing every number in the sequence, then the probability of choosing each number for the first time is $\frac{1}{s_{i}}$. Hence the number of primality tests needed is
\[\begin{align} O(\log N) & = 1 + \frac{1}{2} + \frac{1}{3} + ... + \frac{1}{N} \end{align}\]Now, we look at the expected number of restarts. The probability of accepting one of the numbers $\leq N$ is
\[\begin{align} P\left[\text{accept} \;|\; s^{*}\right] & = \left(\frac{s^{*}}{N}\right)\left(\frac{1}{s^{*}}\right)\prod_{s\in S_{G}}{\left(1-\frac{1}{s_{i}}\right)} \\ & = \frac{1}{N}\prod_{s\in S_{G}}{\left(1-\frac{1}{s_{i}}\right)} \\ \end{align}\]Since we can accept any number from $1$ to $N$, the probability of accepting is then $\prod_{s\in S_{G}}{\left(1-\frac{1}{s_{i}}\right)}$
It turns out that the expected number of rejections (and therefore restarts) is
\[\begin{align} E\left[\text{# of rejections}\right] & = \frac{1}{\prod_{s\in S_{G}}{\left(1-\frac{1}{s_{i}}\right)}} \\\\ & = O(\log N) \end{align}\]Hence the running time is
\[O(\log^{2}N\cdot\text{primality test})\]Since we’re assuming that the primality test can be executed in polynomial time, we are done.
Aftermath
I stumbled across the Prime Number Theorem (PNT) which describes the distribution of prime numbers asymptotically as $N$ approaches infinity.
I found this Khan academy video intuitive for understanding the PNT.
Now I’m curious as to how np.random.randint()
is implemented =). That might a discussion for another post.