MAT 338 Day 21 2011

2392 days ago by kcrisman

Today we will have our last crack at issues directly related to cryptography.  (That doesn't mean that other stuff we do is unrelated - oh no!  But I will not make direct connections with them.)   We will focus on the main attack on the RSA algorithm, namely factorization.  

Let's look at one more toy RSA problem to get a sense of what is going on.  First, I choose a modulus $n$.  I will also verify it has two prime factors without telling you what they are.

n=899 html("There are $%s$ prime factors and their powers are $%s$ and $%s$."%(len(n.factor()),n.factor()[0][1],n.factor()[1][1])) 
       
There are 2 prime factors and their powers are 1 and 1.
There are 2 prime factors and their powers are 1 and 1.

Then I choose an exponent to raise my secret message to...

e=13 html("We choose $n=%s$ and exponent $e=%s$, and verify that $gcd(e,\phi(n))=1$: %s"%(n,e,1==gcd(e,euler_phi(n)))) 
       
We choose n=899 and exponent e=13, and verify that gcd(e,\phi(n))=1: True
We choose n=899 and exponent e=13, and verify that gcd(e,\phi(n))=1: True

I haven't told you $\phi(n)$, but this guarantees it is coprime to my (public) encryption key $e=13$.  Now we can encode our message, $x=11$:

x=11 message=mod(x,n)^e message 
       
21
21

Now, how could we hope to crack this sinister message?  (Assume that euler_phi(899) is just too huge for Sage to do.)  Well, we do know $n=899$ and that $e=13$.  That could help.  

In fact, if we knew $p$ and $q$, we could easily calculate $\phi(n)$ without even using Sage.   Can you quickly now factor $n$ without using Sage?  Be smart about it - think strategically; how should I choose a public modulus $n$ to make this hard to do?

 
       
 
       
 
       
 
       

Hopefully we figured it out.  Then we just need to find an inverse modulo $\phi(n)=(p-1)(q-1)$ to get our decryption key.

p=29 q=31 f=inverse_mod(e,(p-1)*(q-1)) f 
       
517
517

And we decrypt:

message^f 
       
11
11

This gives us the original message $x=11$ again.  

I hope this makes it clear why factorization, not just looking for primes, might be important.  To be truthful, many researchers in factorization simply do it to stay one step ahead of the other side, who is presumably also researching factorization - so to some extent it is an arms race.  

But of course it is also inherently interesting mathematically!  Here is an interesting factoid, for instance.

Fact: $$\text{If I know }\phi(n)\text{ and }n\text{, and know that }n\text{ is a product of exactly two distinct primes, I can easily compute them}\; .$$  

Of course, if we know $\phi(n)$, we already can crack the code, but who cares - maybe we are given $\phi(n)$ and $n$ and want the factorization.  

  • Suppose the (as yet unknown) primes are $p$ and $q$.
  • Expand our formula $$\phi(n)=(p-1)(q-1)=pq-p-q+1=n-(p+q)+1$$ 
  • We now can represent both $p+q$ and $pq$ as formulas in $n$ and $\phi(n)$:
    • $p+q=n-\phi(n)+1$
    • $pq=n$
  • Where might we have a formula with $p+q$ and $pq$?  That should seem familiar... $$(x-p)(x-q)=x^2-(p+q)x+pq$$
  • So we can simply use the quadratic formula on this expression to get the values for $p$ and $q$! $$p,q=\frac{(p+q)\pm\sqrt{(p+q)^2-4pq}}{2}=\frac{n-\phi(n)+1}{2}\pm\frac{\sqrt{(n-\phi(n)+1)^2-4n}}{2}$$

Example:

Continuing the example above, $$x^2-(899-840+1)x+899=x^2-60x+899=0$$ gives $$x=\frac{60\pm \sqrt{60^2-4(1)(899)}}{2(1)}=30\pm\frac{\sqrt{3600-3596}}{2}=30\pm 1=29,31$$

Okay, that was fun - but back to business.  The first, and oldest, method of factoring is one you already know, and maybe used a few minutes ago - trial factorization.  It is the method we used with the Sieve of Eratosthenes; you just try each prime number, one by one.  

Does anyone remember what the highest number you would have to try is in order to factor $n$?  (Can you prove it?)

The following algorithm does this very naively (and slowly, even for trial division).  Let's try to talk through what each step does.

def TrialDivFactor(n): p=next_prime(1) top=ceil(math.sqrt(n)) while p<top: if mod(n,p)==0: break p=next_prime(p) if n==1: print "1 is not prime" elif p==n: print n,"is prime" elif mod(n,p)==0: print n,"factors as",p,"times",n/p else: print n,"is prime" 
       

Now let me verify it works on easy examples.  Remember, we are just looking for factors at this point, not complete factorizations.

for z in range(1,18): TrialDivFactor(z) 
       
1 is not prime
2 is prime
3 is prime
4 factors as 2 times 2
5 is prime
6 factors as 2 times 3
7 is prime
8 factors as 2 times 4
9 factors as 3 times 3
10 factors as 2 times 5
11 is prime
12 factors as 2 times 6
13 is prime
14 factors as 2 times 7
15 factors as 3 times 5
16 factors as 2 times 8
17 is prime
1 is not prime
2 is prime
3 is prime
4 factors as 2 times 2
5 is prime
6 factors as 2 times 3
7 is prime
8 factors as 2 times 4
9 factors as 3 times 3
10 factors as 2 times 5
11 is prime
12 factors as 2 times 6
13 is prime
14 factors as 2 times 7
15 factors as 3 times 5
16 factors as 2 times 8
17 is prime

Okay, so this seems reasonable.  But it's a little more problematic when you try to do large numbers, where large means "bigger than you can do by hand, but nowhere close to the size we looked at in the last class."  I'll actually time how long it takes.

@interact def _(n=6739815371): TrialDivFactor(n) timeit('TrialDivFactor(n)') 
       

Click to the left again to hide and once more to show the dynamic interactive window

Sage actually has this implemented in a much faster way, primarily by using optimized integers and a special version of Python that allows turning it into much faster code in the C language (Cython).  Notice that it just returns a single factor - another slight speedup.

@interact def _(n=6739815371): print n.trial_division() timeit('n.trial_division()') 
       

Click to the left again to hide and once more to show the dynamic interactive window

That's roughly one thousand times faster for the initial example!  Naturally, it's possible to speed up even more.  But note that getting the full factorization slows us back down; after all, one has to check that the remaining factor is prime (or factor it, if it isn't).

@interact def _(n=6739815371): print n.factor() timeit('n.factor()') 
       

Click to the left again to hide and once more to show the dynamic interactive window

Even for this smaller number it takes some actual time - here is where one sees the difference between different implementations of the same algorithm.

timeit('TrialDivFactor(997*991)') 
       
125 loops, best of 3: 3.41 ms per loop
125 loops, best of 3: 3.41 ms per loop
timeit('(997*991).trial_division()') 
       
625 loops, best of 3: 2.79 µs per loop
625 loops, best of 3: 2.79 µs per loop
timeit('(997*991).factor()') 
       
625 loops, best of 3: 20.7 µs per loop
625 loops, best of 3: 20.7 µs per loop

So much for trial division!  But we have other tools at our disposal.

Some of you might have tried something a little different with $n=899$ from our earlier RSA problem.  Since we know that someone is trying to protect a secret, they probably are not going to pick a number with primes like $3$ and $5$ in it - that would be too easy to factor.  

In fact, it stands to reason that the primes $p$ and $q$ should be relatively large compared to $n$ - so why not start in the middle?

This was Fermat's idea for factoring larger numbers.  However, he didn't just start with primes in the middle; for one thing, if your number is even somewhat big and you don't have a computer or huge list of primes, how would you know where to start?  So Fermat became clever, as always, and used an algebraic identity to help himself along.

  • Write $n=ab$, with $a>b$, and assume $n$ is odd (for hopefully obvious reasons!).  
  • Then we can write $n$ as a difference of two square numbers;
    • Namely, the squares of $s=\frac{a+b}{2}$ and $t=\frac{a-b}{2}$: $$s^2-t^2=\left(\frac{a+b}{2}\right)^2-\left(\frac{a-b}{2}\right)^2=\frac{a^2}{2}-\frac{a^2}{2}+\frac{b^2}{2}-\frac{b^2}{2}+\frac{2ab}{4}+\frac{2ab}{4}=ab=n$$ 

This may seem like an obscure identity to us, but at the time (and even until the last century) such identities were the bread and butter of algebra, before we had tools like computers to help us along.  

So what Fermat did is to try this identity backwards.  Here is his strategy.

Fermat factorization:

  • Look for perfect square numbers $s^2$ a little bigger than $n$ (where 'little' might get big, but starts as close to $n$ as possible).
  • Then check whether $s^2-n$ is itself a perfect square!  
  • That means we turned $$s^2-t^2=n\text{ around into }s^2-n=t^2\text{, essentially.}$$  Presumably it would be pointless to check every possible $s^2-n$, but at least checking quite a few of them might lead to results in factoring.
  • If you find $s$ and $t$, you are essentially done.
    • Now, $s$ and $t$ are NOT the factors of $n$.  
    • But we can solve for them; $$a=s+t\text{ and }b=s-t\, .$$

Here is an implementation - again, assuredly slow, but at least verbose in its explanation - of this strategy.  We simply start with the next $s$ above the square root of $n$, and just keep trying $s^2-n$ again and again for bigger and bigger $s$.

def FermatFactor(n,verbose=False): if n%2==0: raise TypeError,"Input must be odd!" s=ceil(math.sqrt(n)) top=(n+1)/2 while is_square(s^2-n)==0: if verbose: print s,"squared minus",n,"is",s^2-n,"which is not a perfect square" s=s+1 t=sqrt(s^2-n) print "Fermat found that",s,"squared minus",t,"squared equals",n if s^2==n: print "So",n,"was already a perfect square,",s,"times",s elif s<top: print "So",s+t,"times",s-t,"equals",(s-t)*(s+t),"which is",n elif s==top: print "So Fermat did not find a factor, which means",n,"is prime!" 
       

Before we move on, let's try to factor 143 and 93 using this algorithm.  Remember, we start with $s^2-n$, where $s$ is the next integer above $\sqrt{n}$, and see if it is a perfect square; then we increase $s$ by one each time.

After we do them by hand, we can see what Sage does with them to check.

 
       
 
       
 
       
 
       
FermatFactor(143,verbose=True) 
       
Fermat found that 12 squared minus 1 squared equals 143
So 13 times 11 equals 143 which is 143
Fermat found that 12 squared minus 1 squared equals 143
So 13 times 11 equals 143 which is 143

Well, we struck gold on the first try here!  That happens if your number is the product of two primes which are two apart.  (Such primes are known as twin primes, and have some interesting stories.  Among other things, calculating with them helped find a bug in the Pentium computer chip in 1995.)

FermatFactor(93,verbose=True) 
       
10 squared minus 93 is 7 which is not a perfect square
11 squared minus 93 is 28 which is not a perfect square
12 squared minus 93 is 51 which is not a perfect square
13 squared minus 93 is 76 which is not a perfect square
14 squared minus 93 is 103 which is not a perfect square
15 squared minus 93 is 132 which is not a perfect square
16 squared minus 93 is 163 which is not a perfect square
Fermat found that 17 squared minus 14 squared equals 93
So 31 times 3 equals 93 which is 93
10 squared minus 93 is 7 which is not a perfect square
11 squared minus 93 is 28 which is not a perfect square
12 squared minus 93 is 51 which is not a perfect square
13 squared minus 93 is 76 which is not a perfect square
14 squared minus 93 is 103 which is not a perfect square
15 squared minus 93 is 132 which is not a perfect square
16 squared minus 93 is 163 which is not a perfect square
Fermat found that 17 squared minus 14 squared equals 93
So 31 times 3 equals 93 which is 93

As you can see, we probably would have been better off with trial division for $n=93$ - it's obvious that it's divisible by 3, but that takes a long time to reach from the middle.

Now, these methods are the beginnings of how people really factor big numbers.  Typically, one does trial division up to a certain size (maybe the first few hundred or thousand primes), then perhaps some modification of Fermat to make sure that there aren't any factors close to the square root if you are attacking something like RSA where that would otherwise be advantageous.  

Then what? 

There are many answers, some of which involve cool things called continued fractions or number fields.  We won't really touch on those topics, but we did talk about something very useful last time - namely, probabilistic/random methods!  That's right, we are going to try to find a factor randomly!

Here is the essence of this approach.

  • Pick some polynomial that will be easy to compute mod $(n)$.
  • Plug in an essentially random seed value.  Often the seed is $2$ or $3$.
  • Compute the polynomial's value at the seed, and check if that has a non-one gcd with $n$.
  • If not, plug the value back into the polynomial, and repeat.

There is a modification I will discuss below that this algorithm uses.

def PollardRhoFactor(n,kstop=50,seed=2): d=1 a,b=seed,seed k=1 def f(x): return (x^2+1)%n while (d==1 or d==n): a = f(a) b = f(f(b)) d=gcd(a-b,n) k=k+1 if k>kstop: html("Pollard Rho breaking off after $%s$ rounds"%k) break if d>1: html("Pollard Rho took $%s$ rounds"%k) html("The number it tried in the last round was $%s$, which shares factor $%s$"%(a-b,d)) html("And $%s$ is a factor of $%s$ since $%s\cdot %s=%s$"%(d,n,d,n/d,d*(n/d))) 
       

The essence of the method is that by plugging in the values of the polynomial modulo $n$, we are generating a 'pseudo-random' sequence of numbers.  And a 'pseudo-random' sequence might be better than the sequences we used for trial division or Fermat factorization, precisely because it will hit some small factors and some other large random factors.  It might also be good that it could give us numbers which, although not a factor of $n$, might at least share a factor with $n$.

Example:

Here are some details. Let $x_0=2$ and $f(x)=x^2+1$ (these could be different, but they are a typical first choice). We create the sequence

  • $x_0\equiv 2$
  • $x_1\equiv f(x_0)$
  • $x_2\equiv f(x_1)$
  • etc. - all mod ($n$).  

For instance, doing it with $n=8051$, we get

var('x') @interact def _(seed=2,n=8051,poly='x^2+1',trials=(10,[2..50])): f(x)=poly for i in range(trials): html("$x_%s=%s$"%(i,seed)) seed=f(seed)%n html("$x_{%s}=%s$"%(i+1,seed)) 
       

Click to the left again to hide and once more to show the dynamic interactive window

Now, these might be kind of random, but we will not actually try to find common divisors of these numbers with $n$.

Instead, we will try to see if all the differences $x_i-x_j$ share a common factor with $n$, using the (highly efficient to compute) gcd.   That is a lot more opportunities!  And hopefully just as - or more - 'random', and just as effective at finding factors.  However, that is too many to check quickly.  

So instead one modifies this one last time.  Remember, polynomials are well-defined in modular arithmetic, and the sequence will eventually repeat modulo any particular modulus $d$, since there are finitely many possible $x_i$.  

In particular, if $d$ is a divisor of $n$, then if $gcd(x_i-x_j,n)=d$ is a shared divisor found by the pair $x_i$ and $x_j$, then not only will it be the case that $$x_i\equiv x_j\text{ mod }(d)$$ but it will also be the case for any number of iterations of $f$ that $$f^n(x_i)\equiv f^n(x_j)\text{ mod }(d)$$ which means the sequence (modulo $d$, the common divisor) repeats itself $j-i$ terms.  

So if we choose $k=j-i$ (or the first multiple of this which is greater than $i$), then $$x_k\equiv x_{2k}\text{ mod }(d)$$  will have to be true as well, so all the $x_i,x_j$ pairs which come from the first one to share a divisor can be checked by checking just this one $gcd(x_{2k}-x_k,n)$.

So for each $k$ we check whether $$1<gcd(x_{2k}-x_k,n)=d<n\, .$$  It turns out that (probabilistically, just like with Miller-Rabin) it should succeed for $k$ in the neighborhood of the size of the square root of the smallest factor of $n$.  So if $n$ has a big, but not too big, divisor, this test should help us find that divisor.

Example continued:

In the example above, the numbers we would get for the first three rounds are

  • $x_2-x_1=26-5=21$
  • $x_4-x_2=7474-26=7448$
  • $x_6-x_3=871-677=194$

These factor as 

factor(21);factor(7448);factor(194) 
       
3 * 7
2^3 * 7^2 * 19
2 * 97
3 * 7
2^3 * 7^2 * 19
2 * 97

and have the following gcds with 8051:

gcd(8051,21);gcd(8051,7448);gcd(8051,194) 
       
1
1
97
1
1
97

So this would catch the four factors $3,7,19$, and $97$, which is not bad, and indeed the final one caught a common divisor with $8051$.

PollardRhoFactor(8051) 
       
Pollard Rho took 4 rounds
The number it tried in the last round was -194, which shares factor 97
And 97 is a factor of 8051 since 97\cdot 83=8051
Pollard Rho took 4 rounds
The number it tried in the last round was -194, which shares factor 97
And 97 is a factor of 8051 since 97\cdot 83=8051

This method is usually called the Pollard rho method, because it is due to John Pollard and the fact that the $x_i$ eventually repeat mod ($d$) (in this case, $d=97$) means you can draw a tail and then a loop, which to a very imaginative eye might look like a Greek $\rho$.  

In general, there are other such probabilistic algorithms, and they are quite successful with factoring numbers which might have reasonably sized but not gigantic factors.  According to Wikipedia, "The rho algorithm's most remarkable success has been the factorization of the eighth Fermat number by Pollard and Brent. They used Brent's variant of the algorithm, which found a previously unknown prime factor. The complete factorization of F8 took, in total, 2 hours on a UNIVAC 1100/42."  Well, this was in the 70s.  But still, it's not bad!

@interact def _(n=991*997,method=['trial','Fermat','Pollard Rho']): if method=='trial': TrialDivFactor(n) if method=='Fermat': FermatFactor(n) if method=='Pollard Rho': PollardRhoFactor(n) 
       

Click to the left again to hide and once more to show the dynamic interactive window

Notice that the rho method didn't come up with an answer yet (it took 50 rounds without success).  I could up this to a lot more.  So what do you do then - bring out the big guns?  Not at all - just as with primality testing, you just change your starting point to try again!

PollardRhoFactor(991*997,seed=3) 
       
Pollard Rho took 15 rounds
The number it tried in the last round was 74775, which shares factor 997
And 997 is a factor of 988027 since 997\cdot 991=988027
Pollard Rho took 15 rounds
The number it tried in the last round was 74775, which shares factor 997
And 997 is a factor of 988027 since 997\cdot 991=988027

Still, you can see how much more there is to say with the number Pollard and Brent factored, the Fermat number $F_8$.  

PollardRhoFactor(2^(2^8)+1,1000000) # one million rounds! 
       
Pollard Rho breaking off after 1000001 rounds
Pollard Rho breaking off after 1000001 rounds

Hmm, what now?  Let's change the seed.

PollardRhoFactor(2^(2^8)+1,1000000,seed=3) 
       
Pollard Rho breaking off after 1000001 rounds
Pollard Rho breaking off after 1000001 rounds

Luckily, we have bigger guns at our disposal in Sage (especially in the component program Pari), that polish thing off rather more quickly.

%time factor(2^(2^8)+1) 
       
1238926361552897 *
93461639715357977769163558199606896584051237541638188580280321
CPU time: 2.23 s,  Wall time: 2.27 s
1238926361552897 * 93461639715357977769163558199606896584051237541638188580280321
CPU time: 2.23 s,  Wall time: 2.27 s

A little better than two hours on a mainframe, or even on this computer, I hope you'll agree.

If you think this sort of thing is cool, the Cunningham Project is a place to explore.  I particularly like their Most Wanted lists.  The idea is that "The Cunningham Project seeks to factor the numbers $b^n \pm 1$ for $b=2, 3, 5, 6, 7, 10, 11, 12$, up to high powers $n$."