Tuesday, March 19, 2019

Quantum factorization intuition and how RSA is broken with examples from Fourier analysis

In this post I give an idea of how a quantum computer may factor an integer. This post is not formal and is intended for non experts, even non-mathematicians (in fact I wrote it for colleagues which are computer engineers). This is not exactly Shor's algorithm but I will explain the main essence of it. Therefore, some gaps may be here. I will explain how the Fourier Transform is used to factor an integer. Calculating the Fourier transform in a classical computer is expensive, however, for a quantum computer due to its physical nature, takes just 1 cycle using quantum gates. I won't delve into details about quantum gates in this post, just in the mathematical part but I will try to be as smooth as possible without any fancy words and language. This is intended for you to finally grasp how a quantum computer is supposed to break RSA and I will provide you some code to play with.

What you need to know first? 

a) What is GCD of two integers? :   If a and b are two integers, the Greatest common divisor (GCD) of a and b is the biggest integer g such that  a/g and b/g is “exact” (leaves no residue, this g is always a positive integer).   This GCD is very easy to calculate even that a and b are thousands of bits (this is how RSA public keys are being compared on the internet to find repeated factors in them and vulnerate its corresponding private keys)
The way of calculating the GCD number is really straightforward, for instance in python recursively as easy as:
        def gcd(a,b):
                       return a
                       return gcd(b,a%b)

Where % is the modulo operation in python, this is explained in (c).  GCD(a,b)=1 means that a and b share no factors and 1 is the biggest divisor of both.

b) A fact from highschool: The polynomial  $latex x^{2n}-1$   factors as  $latex (x^n + 1)(x^n -1)$. This is something you all know from highschool as difference of squares.

c) Modular arithmetic: $latex A=B \bmod N$   means “B is the residue of dividing A/N” and this residue of course is always strictly less than N.

d) Periodicity of modulo operation: Take any integer a, and build the sequence $latex a^0, a^1, a^2, a^3,\ldots, a^k\bmod N$ (this is a sequence “reduced” modulo N, which means dividing by N each element in the sequence and inserting the residue).

This list will be periodic eventually since N is finite. (patterns in the sequence will appear)
For example take N=14 the sequence $latex \lbrace 3^0, 3^1, 3^2, 3^3, 3^4, 3^5, 3^6, 3^7, 3^8,\ldots\rbrace\bmod 14$ is reduced to $latex \lbrace 1,3,9,13,11,5,1,3,9,\ldots \rbrace$.

This “signal $latex 3^x \bmod 14$” has period 6 since $latex 3^{x+6} \bmod 14 = 3^x \bmod 14$  for all x (in highschool you see for example that $latex\sin(x)$ has period $latex 2\pi$ since $latex \sin(x)=\sin(x+2\pi)$

e) Fourier transform of periodic function: If you have a periodic signal and you feed it to a Fourier transform, this will output a function where in the x-axis you will have the frequencies of the original data. In the y-axis the magnitude (size) of the periodic data. 
To find the longest period of a signal, since Period = 1/Frequency (highschool) you just need to find all the highest amplitudes (y values) and check its corresponding x-coordinate (frequency)  and transform it to period. This is 1/x to get the period of the original high amplitude y value (you are interested in the highest period). For example, if you have a high Y value (amplitude), and its x-coordinate X is 0.1428 HZ then 1/X=1/0.1428 = 7 ( and the period is P=7).

Factorization Algorithm using periods mod N 
INPUT (An integer N to Factorize), OUTPUT (A factor)

  1. Pick a random $latex x$ between $latex 2$ and $latex N$
  2. If $latex g=\gcd(x,N)\geq 2$ then return $latex x$ (since $latex g$ is greater than $latex 1$ and $latex \gcd(x,N)$ is a common divisor of $latex N$, we found accidentally a factor $latex g$ which is improbable)
  3. Calculate the sequence $latex S:=\lbrace x^0, x^1, x^2, \ldots, x^N\rbrace \bmod N$ (this is a quantum step which is done in a quantum register in one cycle)
  4. Calculate the period $latex P$ of $latex S$ using quantum fourier transform (this is the same as the fourier transform but using quantum parallelism which for now you see as a blackbox since this is the fast part that is calculated in 1 Cycle on a quantum processor, since it is done using the physical nature of the processor)
  5. If $latex p$ is not even, goto  step 1  ($latex P$ will be even with high probability, this is a deep mathematical number theoretical argument)
  6. Since $latex P$ is the period, we have that $latex x^P \equiv 1 \bmod N$. Further since $latex P$ is even, put “1” in the other side of the previous equality and using (b) above we can factor       $latex (x^{P/2}+1)(x^{P/2}-1) \equiv 0 \bmod N$  .
  7. Since $latex (x^{P/2}+1)(x^{P/2}-1) \equiv 0 \bmod N$  this means that the left hand side of the equation leaves 0 residue when divided by $latex N$ (see c) above). This means that at least one of both factors shares a factor with N. This factor is one (or both) of $latex \gcd((x^{P/2}\pm 1),N)$.
Therefore, $latex f=\gcd(x^{P/2}+1) , N)$  OR $latex f=\gcd(x^{P/2}-1),N)$ is a factor of $latex N$ and Voila.

Example, factorizing N=15 using the previous algorithm:

We will factor the number 15 by hand using the previous algorithm.
  1. We pick randomly $latex x=7$
  2. We have that $latex \gcd(15,7) = 1$ therefore, we can stay with our choice of $latex x$ since this $latex x$ does not share factors with $latex N$, we verify this quickly (even that $latex N$ or $latex x$ have thousands of bits) as:
>>> N=15
>>> x=7
>>> from math import gcd
>>> gcd(N,x)

  1. We calculate the sequence $latex S=\lbrace 7^0, 7^1, \ldots , 7^N\rbrace \bmod N=15$
>>> N=15
>>> x=7
>>> S=[]
>>> for k in range(0,N+1):
...     S.append((x**k)%N)
>>> S
[1, 7, 4, 13, 1, 7, 4, 13, 1, 7, 4, 13, 1, 7, 4, 13]

  1. We use the Fourier transform to calculate the period $latex P$ of $latex S$ looking at the frequencies (x coordinates) of the highest amplitudes (y coordinates). We can see this visually below (ignore the high frequency at 0.0 which is infinity and it is a technicality).
Note that the period of $latex S$ seems to be 4 just by looking the sequence $latex S$, but we do not know that, and in real life sequences will be much larger, and this is a toy example.
>>> import numpy as np
>>> S = np.array(S)
>>> fourierS=np.abs(np.fft.fft(S))
>>> freqdomain = np.fft.fftfreq(S.size,d=1)
>>> import matplotlib.pyplot as plt
>>> plt.subplot(2,1,1)
>>> plt.plot(np.arange(S.size), S, 'r--')
>>> plt.subplot(2,1,2)
>>> plt.plot(freqdomain,np.abs(fourierS),'ro')
>>> plt.show()

Look that the frequency coordinate (in the x axis) of one of the highest amplitudes (in the y axis). In the Fourier spectrum, the marked high amplitude has frequency 0.250737 (x coordinate).  Using the highschool formula 1/Freq = Period we have that 1/0.250737  is 3.99  (which is rounded to 4) and this is the highest period $latex P$ of the original signal (we already knew it was 4 but we are literally looking at the frequency through Fourier analysis).   

  1. We saw that the highest amplitude (interpret it as highest length of a period of S, since there are a lot of small periods within our signa, for example 1,7 repeats also or 7,4,13 too with some gaps) has frequency 0.2507 Hz in the Fourier spectrum, and the period P was obtained by 1/T=P as 1/0.2507 which is 3.99 ~= 4 so the highest period P=4.
  1. We calculate  $latex x^{P/2} + 1$ and $latex x^{P/2} - 1$, namely $latex 7^2 + 1$ and $latex 7^2 - 1$  which are 50 and 49 respectively. Further, we check for common factors with $latex N$ (recall by b) and 7. above that $latex x^P \equiv 1 \bmod N$ since the signal has period $latex P$) and therefore $latex x^P-1 = (x^{P/2}+1)(x^{P/2}-1) \equiv 0 \bmod N$.
>>> P=4
>>> from math import gcd
>>> gcd(x**(int(P/2))+1,N)
>>> gcd(x**(int(P/2))-1,N)

  1. Voila, we have the factors 5 and 3.

There are some technicalities to explain when the signal is cut suddenly at the end without finishing the “wave” , the period needs to be approximated since it wont be exact, but it is a matter of rounding correctly.

A more serious calculation using the fourier spectrum is for a 4 digit number

Using another technique to calculate periods via convolution:

Source code and more information

Source code that finds the periods using Fourier and other implementation with discrete Convolution (faster). Screenshots for more complex factorizations are in http://ff2.nl/quantum-factorization/

Eduardo Ruiz Duarte (beck)
twitter: @toorandom