Today we will have our last crack at issues directly related to cryptography. (That doesn't mean that other stuff we do is unrelated  oh no! But I will not make direct connections with them.) We will focus on the main attack on the RSA algorithm, namely factorization.
Let's look at one more toy RSA problem to get a sense of what is going on. First, I choose a modulus $n$. I will also verify it has two prime factors without telling you what they are.
There are 2 prime factors and their powers are 1 and 1. There are 2 prime factors and their powers are 1 and 1. 
Then I choose an exponent to raise my secret message to...
We choose n=899 and exponent e=13, and verify that gcd(e,\phi(n))=1: True We choose n=899 and exponent e=13, and verify that gcd(e,\phi(n))=1: True 
I haven't told you $\phi(n)$, but this guarantees it is coprime to my (public) encryption key $e=13$. Now we can encode our message, $x=11$:
21 21 
Now, how could we hope to crack this sinister message? (Assume that euler_phi(899) is just too huge for Sage to do.) Well, we do know $n=899$ and that $e=13$. That could help.
In fact, if we knew $p$ and $q$, we could easily calculate $\phi(n)$ without even using Sage. Can you quickly now factor $n$ without using Sage? Be smart about it  think strategically; how should I choose a public modulus $n$ to make this hard to do?




Hopefully we figured it out. Then we just need to find an inverse modulo $\phi(n)=(p1)(q1)$ to get our decryption key.
517 517 
And we decrypt:
11 11 
This gives us the original message $x=11$ again.
I hope this makes it clear why factorization, not just looking for primes, might be important. To be truthful, many researchers in factorization simply do it to stay one step ahead of the other side, who is presumably also researching factorization  so to some extent it is an arms race.
But of course it is also inherently interesting mathematically! Here is an interesting factoid, for instance.
Fact: $$\text{If I know }\phi(n)\text{ and }n\text{, and know that }n\text{ is a product of exactly two distinct primes, I can easily compute them}\; .$$
Of course, if we know $\phi(n)$, we already can crack the code, but who cares  maybe we are given $\phi(n)$ and $n$ and want the factorization.
Example:
Continuing the example above, $$x^2(899840+1)x+899=x^260x+899=0$$ gives $$x=\frac{60\pm \sqrt{60^24(1)(899)}}{2(1)}=30\pm\frac{\sqrt{36003596}}{2}=30\pm 1=29,31$$
Okay, that was fun  but back to business. The first, and oldest, method of factoring is one you already know, and maybe used a few minutes ago  trial factorization. It is the method we used with the Sieve of Eratosthenes; you just try each prime number, one by one.
Does anyone remember what the highest number you would have to try is in order to factor $n$? (Can you prove it?)
The following algorithm does this very naively (and slowly, even for trial division). Let's try to talk through what each step does.

Now let me verify it works on easy examples. Remember, we are just looking for factors at this point, not complete factorizations.
1 is not prime 2 is prime 3 is prime 4 factors as 2 times 2 5 is prime 6 factors as 2 times 3 7 is prime 8 factors as 2 times 4 9 factors as 3 times 3 10 factors as 2 times 5 11 is prime 12 factors as 2 times 6 13 is prime 14 factors as 2 times 7 15 factors as 3 times 5 16 factors as 2 times 8 17 is prime 1 is not prime 2 is prime 3 is prime 4 factors as 2 times 2 5 is prime 6 factors as 2 times 3 7 is prime 8 factors as 2 times 4 9 factors as 3 times 3 10 factors as 2 times 5 11 is prime 12 factors as 2 times 6 13 is prime 14 factors as 2 times 7 15 factors as 3 times 5 16 factors as 2 times 8 17 is prime 
Okay, so this seems reasonable. But it's a little more problematic when you try to do large numbers, where large means "bigger than you can do by hand, but nowhere close to the size we looked at in the last class." I'll actually time how long it takes.
Click to the left again to hide and once more to show the dynamic interactive window 
Sage actually has this implemented in a much faster way, primarily by using optimized integers and a special version of Python that allows turning it into much faster code in the C language (Cython). Notice that it just returns a single factor  another slight speedup.
Click to the left again to hide and once more to show the dynamic interactive window 
That's roughly one thousand times faster for the initial example! Naturally, it's possible to speed up even more. But note that getting the full factorization slows us back down; after all, one has to check that the remaining factor is prime (or factor it, if it isn't).
Click to the left again to hide and once more to show the dynamic interactive window 
Even for this smaller number it takes some actual time  here is where one sees the difference between different implementations of the same algorithm.
125 loops, best of 3: 3.41 ms per loop 125 loops, best of 3: 3.41 ms per loop 
625 loops, best of 3: 2.79 µs per loop 625 loops, best of 3: 2.79 µs per loop 
625 loops, best of 3: 20.7 µs per loop 625 loops, best of 3: 20.7 µs per loop 
So much for trial division! But we have other tools at our disposal.
Some of you might have tried something a little different with $n=899$ from our earlier RSA problem. Since we know that someone is trying to protect a secret, they probably are not going to pick a number with primes like $3$ and $5$ in it  that would be too easy to factor.
In fact, it stands to reason that the primes $p$ and $q$ should be relatively large compared to $n$  so why not start in the middle?
This was Fermat's idea for factoring larger numbers. However, he didn't just start with primes in the middle; for one thing, if your number is even somewhat big and you don't have a computer or huge list of primes, how would you know where to start? So Fermat became clever, as always, and used an algebraic identity to help himself along.
This may seem like an obscure identity to us, but at the time (and even until the last century) such identities were the bread and butter of algebra, before we had tools like computers to help us along.
So what Fermat did is to try this identity backwards. Here is his strategy.
Fermat factorization:
Here is an implementation  again, assuredly slow, but at least verbose in its explanation  of this strategy. We simply start with the next $s$ above the square root of $n$, and just keep trying $s^2n$ again and again for bigger and bigger $s$.

Before we move on, let's try to factor 143 and 93 using this algorithm. Remember, we start with $s^2n$, where $s$ is the next integer above $\sqrt{n}$, and see if it is a perfect square; then we increase $s$ by one each time.
After we do them by hand, we can see what Sage does with them to check.




Fermat found that 12 squared minus 1 squared equals 143 So 13 times 11 equals 143 which is 143 Fermat found that 12 squared minus 1 squared equals 143 So 13 times 11 equals 143 which is 143 
Well, we struck gold on the first try here! That happens if your number is the product of two primes which are two apart. (Such primes are known as twin primes, and have some interesting stories. Among other things, calculating with them helped find a bug in the Pentium computer chip in 1995.)
10 squared minus 93 is 7 which is not a perfect square 11 squared minus 93 is 28 which is not a perfect square 12 squared minus 93 is 51 which is not a perfect square 13 squared minus 93 is 76 which is not a perfect square 14 squared minus 93 is 103 which is not a perfect square 15 squared minus 93 is 132 which is not a perfect square 16 squared minus 93 is 163 which is not a perfect square Fermat found that 17 squared minus 14 squared equals 93 So 31 times 3 equals 93 which is 93 10 squared minus 93 is 7 which is not a perfect square 11 squared minus 93 is 28 which is not a perfect square 12 squared minus 93 is 51 which is not a perfect square 13 squared minus 93 is 76 which is not a perfect square 14 squared minus 93 is 103 which is not a perfect square 15 squared minus 93 is 132 which is not a perfect square 16 squared minus 93 is 163 which is not a perfect square Fermat found that 17 squared minus 14 squared equals 93 So 31 times 3 equals 93 which is 93 
As you can see, we probably would have been better off with trial division for $n=93$  it's obvious that it's divisible by 3, but that takes a long time to reach from the middle.
Now, these methods are the beginnings of how people really factor big numbers. Typically, one does trial division up to a certain size (maybe the first few hundred or thousand primes), then perhaps some modification of Fermat to make sure that there aren't any factors close to the square root if you are attacking something like RSA where that would otherwise be advantageous.
Then what?
There are many answers, some of which involve cool things called continued fractions or number fields. We won't really touch on those topics, but we did talk about something very useful last time  namely, probabilistic/random methods! That's right, we are going to try to find a factor randomly!
Here is the essence of this approach.
There is a modification I will discuss below that this algorithm uses.

The essence of the method is that by plugging in the values of the polynomial modulo $n$, we are generating a 'pseudorandom' sequence of numbers. And a 'pseudorandom' sequence might be better than the sequences we used for trial division or Fermat factorization, precisely because it will hit some small factors and some other large random factors. It might also be good that it could give us numbers which, although not a factor of $n$, might at least share a factor with $n$.
Example:
Here are some details. Let $x_0=2$ and $f(x)=x^2+1$ (these could be different, but they are a typical first choice). We create the sequence
For instance, doing it with $n=8051$, we get
Click to the left again to hide and once more to show the dynamic interactive window 
Now, these might be kind of random, but we will not actually try to find common divisors of these numbers with $n$.
Instead, we will try to see if all the differences $x_ix_j$ share a common factor with $n$, using the (highly efficient to compute) gcd. That is a lot more opportunities! And hopefully just as  or more  'random', and just as effective at finding factors. However, that is too many to check quickly.
So instead one modifies this one last time. Remember, polynomials are welldefined in modular arithmetic, and the sequence will eventually repeat modulo any particular modulus $d$, since there are finitely many possible $x_i$.
In particular, if $d$ is a divisor of $n$, then if $gcd(x_ix_j,n)=d$ is a shared divisor found by the pair $x_i$ and $x_j$, then not only will it be the case that $$x_i\equiv x_j\text{ mod }(d)$$ but it will also be the case for any number of iterations of $f$ that $$f^n(x_i)\equiv f^n(x_j)\text{ mod }(d)$$ which means the sequence (modulo $d$, the common divisor) repeats itself $ji$ terms.
So if we choose $k=ji$ (or the first multiple of this which is greater than $i$), then $$x_k\equiv x_{2k}\text{ mod }(d)$$ will have to be true as well, so all the $x_i,x_j$ pairs which come from the first one to share a divisor can be checked by checking just this one $gcd(x_{2k}x_k,n)$.
So for each $k$ we check whether $$1<gcd(x_{2k}x_k,n)=d<n\, .$$ It turns out that (probabilistically, just like with MillerRabin) it should succeed for $k$ in the neighborhood of the size of the square root of the smallest factor of $n$. So if $n$ has a big, but not too big, divisor, this test should help us find that divisor.
Example continued:
In the example above, the numbers we would get for the first three rounds are
These factor as
3 * 7 2^3 * 7^2 * 19 2 * 97 3 * 7 2^3 * 7^2 * 19 2 * 97 
and have the following gcds with 8051:
1 1 97 1 1 97 
So this would catch the four factors $3,7,19$, and $97$, which is not bad, and indeed the final one caught a common divisor with $8051$.
Pollard Rho took 4 rounds The number it tried in the last round was 194, which shares factor 97 And 97 is a factor of 8051 since 97\cdot 83=8051 Pollard Rho took 4 rounds The number it tried in the last round was 194, which shares factor 97 And 97 is a factor of 8051 since 97\cdot 83=8051 
This method is usually called the Pollard rho method, because it is due to John Pollard and the fact that the $x_i$ eventually repeat mod ($d$) (in this case, $d=97$) means you can draw a tail and then a loop, which to a very imaginative eye might look like a Greek $\rho$.
In general, there are other such probabilistic algorithms, and they are quite successful with factoring numbers which might have reasonably sized but not gigantic factors. According to Wikipedia, "The rho algorithm's most remarkable success has been the factorization of the eighth Fermat number by Pollard and Brent. They used Brent's variant of the algorithm, which found a previously unknown prime factor. The complete factorization of F_{8} took, in total, 2 hours on a UNIVAC 1100/42." Well, this was in the 70s. But still, it's not bad!
Click to the left again to hide and once more to show the dynamic interactive window 
Notice that the rho method didn't come up with an answer yet (it took 50 rounds without success). I could up this to a lot more. So what do you do then  bring out the big guns? Not at all  just as with primality testing, you just change your starting point to try again!
Pollard Rho took 15 rounds The number it tried in the last round was 74775, which shares factor 997 And 997 is a factor of 988027 since 997\cdot 991=988027 Pollard Rho took 15 rounds The number it tried in the last round was 74775, which shares factor 997 And 997 is a factor of 988027 since 997\cdot 991=988027 
Still, you can see how much more there is to say with the number Pollard and Brent factored, the Fermat number $F_8$.
Pollard Rho breaking off after 1000001 rounds
Pollard Rho breaking off after 1000001 rounds

Hmm, what now? Let's change the seed.
Pollard Rho breaking off after 1000001 rounds
Pollard Rho breaking off after 1000001 rounds

Luckily, we have bigger guns at our disposal in Sage (especially in the component program Pari), that polish thing off rather more quickly.
1238926361552897 * 93461639715357977769163558199606896584051237541638188580280321 CPU time: 2.23 s, Wall time: 2.27 s 1238926361552897 * 93461639715357977769163558199606896584051237541638188580280321 CPU time: 2.23 s, Wall time: 2.27 s 
A little better than two hours on a mainframe, or even on this computer, I hope you'll agree.
If you think this sort of thing is cool, the Cunningham Project is a place to explore. I particularly like their Most Wanted lists. The idea is that "The Cunningham Project seeks to factor the numbers $b^n \pm 1$ for $b=2, 3, 5, 6, 7, 10, 11, 12$, up to high powers $n$."
