WEBVTT

00:00:00.872 --> 00:00:06.166
In this second half of module 4.9, we are going to look specifically

00:00:06.166 --> 00:00:11.928
at one example of a Divide and Conquer algorithm, it's the Cooley-Turkey Fast Fourier Transform,

00:00:11.928 --> 00:00:15.967
we will see a matrix view of this algorithm, as well as flow-graphs,

00:00:15.967 --> 00:00:21.047
and then conclude by looking at more general cases of Fast Fourier Transforms.

00:00:22.200 --> 00:00:25.577
So far, we have only had conceptual discussions

00:00:25.577 --> 00:00:28.596
of what the Divide and Conquer algorithm would be.

00:00:28.596 --> 00:00:34.542
We now need to be concrete and actually develop an algorithm for the Discrete Fourier Transform,

00:00:34.542 --> 00:00:36.583
based on the idea of Divide and Conquer.

00:00:36.583 --> 00:00:41.645
We'll do this by analyzing an algorithm called the decimation-in-time algorithm.

00:00:41.645 --> 00:00:45.966
It'll take us a number of slides until we get to all the details of it.

00:00:45.966 --> 00:00:47.293
But this is very important,

00:00:47.293 --> 00:00:53.864
because you're going to understand the mechanics of one of the most famous algorithms of all times.

00:00:53.864 --> 00:01:00.471
Start with the definition of the DFT, so X[k] is the sum from 0 to N-1

00:01:00.472 --> 00:01:04.193
of (?) x[n], multiplied by the root of unity to the power nk.

00:01:04.193 --> 00:01:09.203
k goes from 0 to N -1, W is the usual Nth root of unity.

00:01:09.203 --> 00:01:15.209
Now, as very often, a good guess is half of the answer to a fast algorithm.

00:01:15.209 --> 00:01:20.533
The good guess is the following: break the input into even and odd index terms,

00:01:20.533 --> 00:01:22.996
that's why it's called decimation-in-time,

00:01:22.996 --> 00:01:28.782
because it's like taking the sequence and throwing out half of the entries,

00:01:28.782 --> 00:01:31.298
that's decimation by a factor of 2.

00:01:31.298 --> 00:01:38.930
And so, x[n] is mapped into x[2n], the even index samples, and x[2n] + 1, the odd index samples,

00:01:38.930 --> 00:01:42.986
and now n goes from 0 to N/2 - 1.

00:01:42.986 --> 00:01:48.120
Recall that N is assumed to be a power of 2, so N/2 is again an integer.

00:01:48.120 --> 00:01:51.083
Then, we need to do something similar to the output.

00:01:51.083 --> 00:01:55.084
But there it's not even and odd, it's first half and second half.

00:01:55.084 --> 00:02:00.581
So, X[k], k going from 0 to N minus 1, becomes the first half.

00:02:00.581 --> 00:02:08.768
X[k] plus the second half X[k+N/2], where now k runs from 0 to N/2 -1.

00:02:10.813 --> 00:02:13.554
Let us now look at the equations for the DFT,

00:02:13.554 --> 00:02:17.438
and use this separation of the inputs into even and odd inputs.

00:02:17.454 --> 00:02:24.266
So X[k] now is the sum of 2 sums of length N/2,

00:02:24.266 --> 00:02:29.997
over the even and odd ones, with respective powers of W.

00:02:29.997 --> 00:02:33.804
And we separate these two sums, we simplify the exponent,

00:02:33.804 --> 00:02:36.926
and then we need to use a little trick here.

00:02:36.926 --> 00:02:38.755
The trick is the following:

00:02:38.771 --> 00:02:42.999
If you look at Nth root of unity, you take its power of 2,

00:02:42.999 --> 00:02:52.944
then this is equal to e to the -j4π/N, which is equal to e to the -j2π divided by N/2.

00:02:52.944 --> 00:02:57.189
And this of course is the root of unity of order N/2.

00:02:57.189 --> 00:03:02.188
Using this, we can write W to the power 2nk

00:03:02.188 --> 00:03:08.088
as W of root N/2, to the power nk.

00:03:08.088 --> 00:03:18.604
We use this and we see now that we have 2 sums of lengths N/2 with respect to the N/2th root of unity

00:03:18.604 --> 00:03:24.612
and we write this as the sum of X' k and X'' k,

00:03:24.612 --> 00:03:28.844
where X'' k is actually multiplied by W to the k.

00:03:28.844 --> 00:03:31.782
Here W is still the nth (?) root of unity.

00:03:34.197 --> 00:03:37.598
We can now say this explicitly in the following way:

00:03:37.598 --> 00:03:42.775
we have 2 half-size DFTs, which we call X', X''.

00:03:43.282 --> 00:03:46.916
We have to multiply the second DFT by W k.

00:03:46.916 --> 00:03:52.624
That will require N/2 complex multiplications and we add the results together.

00:03:52.624 --> 00:03:57.290
With this, we have found the first half of our result, namely X[k],

00:03:57.290 --> 00:04:01.573
for k going from zero to N/2 -1.

00:04:01.573 --> 00:04:06.896
Now we need to find out how to compute the second half of the DFT, and then we'll be done.

00:04:08.357 --> 00:04:12.356
We have our first half of the DFT: let us consider the second half,

00:04:12.356 --> 00:04:15.885
So that's X[k+N/2].

00:04:15.885 --> 00:04:21.940
This is made up of two sums over the even and the odd inputs.

00:04:21.940 --> 00:04:28.726
And the sum of the odd inputs is pre-multiplied by W to the power k + N/2.

00:04:28.726 --> 00:04:35.667
We need again a little trick and this is to simplify W to the power N/2,

00:04:35.667 --> 00:04:46.152
this is equal to e to the - j Nπ/N, this is e to the -jπ, which is equal to -1.

00:04:46.628 --> 00:04:49.022
With this, we see that

00:04:49.022 --> 00:04:56.035
the second sum is simply premultiplied by W to the k and a change of sign,

00:04:56.035 --> 00:05:05.566
so we get as a net result that the second half of the DFT is actually equal to X' - Wk X''

00:05:05.566 --> 00:05:08.564
So it's the same two parts we had before.

00:05:08.564 --> 00:05:12.425
We simply -- subtraction inside of a sum.

00:05:14.593 --> 00:05:18.112
In words: we have now computed both parts.

00:05:18.112 --> 00:05:21.168
So we divide the input into even and odd samples,

00:05:21.168 --> 00:05:24.509
compute 2 DFTs of size N/2.

00:05:24.718 --> 00:05:33.080
We multiply this output of the second DFT by the roots of unity of order n (?) raised to the kth power.

00:05:33.080 --> 00:05:35.843
That uses capital N/2 multiplications.

00:05:36.381 --> 00:05:43.084
Then we recover the outputs by taking the sum and the difference of these 2 parts,

00:05:43.084 --> 00:05:47.507
and that will give us X[k] and X[k+N/2].

00:05:49.413 --> 00:05:55.700
It is easiest to see this in a diagram, and we do the diagram for a DFT of size 8.

00:05:55.700 --> 00:05:58.378
So we have x[0] to x[7].

00:05:58.378 --> 00:06:03.673
They are divided into the even parts, the upper half, the odd part, the lower half.

00:06:03.673 --> 00:06:07.273
And we compute 2 DFTs of size N/2, in this case,

00:06:07.273 --> 00:06:09.686
that would be DFT's of size 4.

00:06:09.686 --> 00:06:16.584
Then we simply recombine the respective outputs by sum and differences,

00:06:16.584 --> 00:06:23.335
and we get X[0], which is a sum of the first output of both DFTs.

00:06:23.335 --> 00:06:30.825
The difference which will give you the X[4] output, which is the difference of these first outputs, etc.

00:06:30.825 --> 00:06:33.950
And with this, we have first half, second half.

00:06:33.950 --> 00:06:40.462
In this very simple diagram, we see very explicitly this Divide and Conquer algorithm at work.

00:06:42.614 --> 00:06:44.579
So what is the complexity now?

00:06:44.579 --> 00:06:50.011
So to split the DFT into two pieces of size N/2, that was free.

00:06:50.011 --> 00:06:53.787
You simply had to select even and odd index inputs.

00:06:53.787 --> 00:06:56.342
Then we compute two DFT's of size N/2.

00:06:56.342 --> 00:07:01.371
Recall that the complexity's quadratic, so we have this N square over 2 complexity here.

00:07:01.371 --> 00:07:06.799
We merge the two results with multiplications: there are N/2 of them.

00:07:06.799 --> 00:07:12.386
And the total complexity, at this stage, is N/2 + N/2 (?),

00:07:12.386 --> 00:07:17.546
which, indeed, is smaller than the original complexity, which is quadratic in N.

00:07:17.546 --> 00:07:21.030
So, in general, we have about half the complexity of the initial problem.

00:07:21.030 --> 00:07:26.155
Now, this would not be a big deal: that's a factor of 2 of savings, nice to have.

00:07:26.155 --> 00:07:28.556
But we are aiming for much more than this.

00:07:31.170 --> 00:07:33.006
What about repeating this process?

00:07:33.006 --> 00:07:38.271
We know that, since N is a power of 2, we can divide in 2, 4, 8, etcetera.

00:07:38.655 --> 00:07:42.905
And we can go until we end up with trivial DFTs of size 2.

00:07:42.905 --> 00:07:45.919
These simply amount to a sum and a difference.

00:07:45.919 --> 00:07:51.676
This will require log N minus 1 (?) steps, because we end up with a DFT of size 2.

00:07:52.506 --> 00:08:00.175
Each step requires a merger and uses an order N/2 multiplications and order N additions,

00:08:00.175 --> 00:08:08.477
so the total will be N/2 (log-in-base-2 N-1) multiplications and N log-in-base-2 N additions.

00:08:08.754 --> 00:08:13.917
So the saving is of order log-in-base-2 N divided by N.

00:08:13.917 --> 00:08:19.038
The key result, the "take home" message here, is that a DFT of size N

00:08:19.038 --> 00:08:22.871
requires order N log-in-base-2 N operations.

00:08:22.871 --> 00:08:27.041
Let us look at a few examples to see how big the savings is.

00:08:27.041 --> 00:08:31.921
So when N is equal to 1024, that's 2 to the power of 10,

00:08:31.921 --> 00:08:35.393
so the saving is of the order of factor 100.

00:08:35.839 --> 00:08:41.380
If N is 2 to the power of 20, which is roughly a million, a little bit more,

00:08:41.380 --> 00:08:46.207
then the saving is of the order of 50,000: so that's a big deal, indeed.

00:08:48.083 --> 00:08:54.318
We have seen a recursive formulation using sums on even and odd indexed inputs.

00:08:54.318 --> 00:08:57.395
We have seen a blog diagram (?) view of the DFT.

00:08:57.395 --> 00:09:00.629
Let's look now at a matrix factorization view.

00:09:00.629 --> 00:09:05.152
This is important because many implementations, for example in Matlab or FreeMat

00:09:05.152 --> 00:09:08.288
will actually use numerical linear algebra,

00:09:08.304 --> 00:09:11.527
and then the matrix factorization is actually important.

00:09:11.527 --> 00:09:19.388
So in this case we look at W4, the size 4 DFT, so recall you have to separate even and odd samples,

00:09:19.388 --> 00:09:21.101
that's the 1st matrix on the right.

00:09:21.101 --> 00:09:24.137
Then we compute two DFT's of size 2.

00:09:24.137 --> 00:09:27.729
These are sum and differences, it's a block diagonal matrix.

00:09:27.729 --> 00:09:32.131
Then we have to multiply the odd outputs by W k:

00:09:32.516 --> 00:09:39.304
here, these are simply j's and minus j, and sum and difference, which is included in here,

00:09:39.304 --> 00:09:44.420
and we get the overall result, with these three small matrices.

00:09:44.432 --> 00:09:47.858
This uses eight additions, no multiplications.

00:09:49.872 --> 00:09:52.224
Let's now move to the next bigger size.

00:09:52.224 --> 00:09:53.913
This is N=8.

00:09:53.913 --> 00:09:58.465
Now this is going to be big because each matrix is of size 8 by 8,

00:09:58.465 --> 00:10:01.173
and we have a factorization to take care of,

00:10:01.173 --> 00:10:06.246
so the first matrix W8 in its initial form we have seen before.

00:10:08.368 --> 00:10:12.074
The first step is to separate even from odd indexed samples.

00:10:12.074 --> 00:10:15.781
Call this D 8 for decimation of size 8.

00:10:15.781 --> 00:10:19.535
You see the matrix here, you can sort of recognize it's structure,

00:10:19.535 --> 00:10:25.655
so the 1st half takes out the even inputs, the 2nd half takes out the odd inputs.

00:10:25.655 --> 00:10:30.757
No arithmetic operations so far, just re-indexing of inputs.

00:10:33.004 --> 00:10:38.913
The step 2 is to compute two DFTs of size N/2, on these even and odd index samples.

00:10:38.913 --> 00:10:45.609
So, each submatrix is a DFT of size 4 and the result is a block diagonal matrix.

00:10:45.609 --> 00:10:51.649
On the diagonal, you have the W4 matrices, off the diagonal, you have the 0 matrix.

00:10:51.649 --> 00:10:55.905
This requires 2 DFTs of size 4, according to what we have just seen,

00:10:55.905 --> 00:10:58.198
that's a total of 16 additions.

00:11:00.075 --> 00:11:05.798
Step 3 is to multiply the output of the 2nd DFT by W to the k.

00:11:05.798 --> 00:11:11.545
This will be a diagonal matrix with an identity matrix in the 1st half of the diagonal

00:11:11.545 --> 00:11:16.366
and the 2nd half is made up of the successive powers of W.

00:11:16.366 --> 00:11:22.980
This will require 2 multiplications because W square is actually -j, and that's free.

00:11:22.980 --> 00:11:25.916
So we can summarize this as a block matrix

00:11:25.916 --> 00:11:32.545
with I4, Λ4 on the diagonal, 04 which are the zero matrices, off the diagonal.

00:11:32.545 --> 00:11:38.550
And the Λ4 is the one that contains the complex multiplications by the roots of unity

00:11:40.796 --> 00:11:44.191
Step 4: we have to recombine the final output

00:11:44.191 --> 00:11:49.324
to obtain X[k] and X[k+N/2] by a sum and difference.

00:11:49.324 --> 00:11:55.227
So this matrix is also very structured. The first half computes the sum.

00:11:55.228 --> 00:12:00.458
So we can put this in block matrix form by having two identity matrices of size 4.

00:12:00.458 --> 00:12:02.805
And the second half computes the differences,

00:12:02.805 --> 00:12:08.617
so that's the identity matrix of size 4 minus itself: that requires 8 additions.

00:12:10.926 --> 00:12:15.517
So in total, we have the product of 4 matrices, D8, the selector matrix,

00:12:15.517 --> 00:12:19.555
then the one that is diagonal with the DFTs of size 4,

00:12:19.555 --> 00:12:25.614
then the one that is diagonal with the I4, Λ4, and finally the sum and differences.

00:12:25.614 --> 00:12:30.398
So the total will be 24 additions and 2 multiplications.

00:12:30.398 --> 00:12:34.747
Now this is much smaller than what we would expect from an N square complexity.

00:12:34.747 --> 00:12:39.931
N square would be in this case 64 order, 64 additions and multiplications.

00:12:39.931 --> 00:12:43.488
So you see that the big saving here is multiplications.

00:12:43.488 --> 00:12:47.038
That is the case here because the DFT is very small

00:12:47.038 --> 00:12:50.597
and so it quickly comes down to trivial DFT's.

00:12:52.581 --> 00:12:56.706
We can write out what we just did in an explicit flow-graph.

00:12:56.706 --> 00:13:03.873
Such a flow-graph simply indicates how inputs -- x[0] to x[7| -- go through the flow-graph,

00:13:03.873 --> 00:13:10.097
get multiplied by the roots of unity when needed, then take sum and differences etc.

00:13:10.097 --> 00:13:15.444
And as you go from left to right, you end up with the output X[0] to X[7].

00:13:15.444 --> 00:13:21.836
What you can recognize visually are the butterflies, which are simply the sum and differences

00:13:21.836 --> 00:13:27.331
that appear again and again in the recursive computation of the DFT

00:13:27.331 --> 00:13:30.371
using the Fast Fourier Transform algorithm.

00:13:31.832 --> 00:13:33.680
Now, is this a big deal?

00:13:33.680 --> 00:13:39.036
In image processing, so, for example, on your digital camera or on your mobile phone,

00:13:39.036 --> 00:13:43.476
the image is taken in small blocks of 8 by 8.

00:13:43.476 --> 00:13:47.167
One computes a transform that is called a Discrete Cosine Transform,

00:13:47.167 --> 00:13:49.070
we shall see this later in this class

00:13:49.070 --> 00:13:51.542
when we study the JPEG compression algorithm,

00:13:51.542 --> 00:13:55.916
and the DCT is a little bit similar to a DFT, it's like a real DFT.

00:13:55.916 --> 00:14:01.060
It has a fast algorithm that is very much inspired by what we just saw.

00:14:03.443 --> 00:14:09.283
Let's compare direct multiplication: so 8 by 8 is 64 inputs;

00:14:09.283 --> 00:14:16.371
squared, that's 4000 multiplications; it's a dominant cost because it's, it's fixed point multiplications.

00:14:16.371 --> 00:14:19.744
The transform can be computed in rows and columns separately,

00:14:19.744 --> 00:14:23.241
so you have 16 DFT's, essentially.

00:14:23.241 --> 00:14:26.689
We have seen that such an algorithm would take 2 multiplications,

00:14:26.689 --> 00:14:30.485
so we would have a total of 32 multiplications.

00:14:30.485 --> 00:14:34.009
So we have a saving of the order of 2 orders of magnitude.

00:14:34.009 --> 00:14:35.528
And that's a big deal.

00:14:35.528 --> 00:14:41.385
So, instead of compressing your image in 100 seconds, it will take 1 second, for example.

00:14:42.999 --> 00:14:46.936
So far, we have only considered lengths N which were powers of 2.

00:14:46.936 --> 00:14:48.993
But of course, other lengths are also important,

00:14:48.993 --> 00:14:53.708
and the message here is that for every possible length N, there is a fast algorithm.

00:14:53.708 --> 00:14:57.151
So the first case is N is equal to a power of a prime

00:14:57.151 --> 00:15:01.824
We looked (?) at the case p is equal to 2, and we will derive the Cooley-Tukey algorithm.

00:15:01.824 --> 00:15:04.922
Very famous, very simple and very important.

00:15:04.922 --> 00:15:08.310
When N is a product of 2 coprime numbers,

00:15:08.310 --> 00:15:14.138
then there is a way to reorder this into a two-dimensional DFT of size N1 by N2,

00:15:14.138 --> 00:15:18.428
this is known as the Good-Thomas algorithm, it was invented in the 50s

00:15:18.428 --> 00:15:20.835
and reinvented in the 60s again.

00:15:20.835 --> 00:15:25.244
When N is a prime, we can do a mapping of the Transform

00:15:25.244 --> 00:15:28.397
that results in a convolution of size N-1.

00:15:28.397 --> 00:15:29.845
This might not be good news,

00:15:29.845 --> 00:15:35.924
except that we'll see that the convolution of size N-1 can also be solved with a DFT,

00:15:35.924 --> 00:15:40.484
and so if N is, for example, 17, then N-1 is 16,

00:15:40.484 --> 00:15:44.212
and this we can solve with a usual Cooley-Tukey fast algorithm.

00:15:44.212 --> 00:15:46.193
By combining all the tricks involved,

00:15:46.193 --> 00:15:50.373
minimizing the number of multiplications, reordering the terms,

00:15:50.373 --> 00:15:54.368
we have the so-called Winograd Fast Fourier Transform,

00:15:54.368 --> 00:15:59.337
which was invented by Winograd in the late 1970's,

00:15:59.337 --> 00:16:04.449
and which really caps the work on fast algorithm development from a theoretical point of view.

00:16:05.786 --> 00:16:10.966
Let's us look at this case N=5 that we had seen before: the matrix is familiar.

00:16:10.966 --> 00:16:16.684
We take out one piece of it and we remove the columns of 1 and the row of 1,

00:16:16.684 --> 00:16:24.236
and we are left with a matrix that has, of course, all powers of W from 1 to 5,

00:16:24.236 --> 00:16:26.754
but each power appears only once

00:16:26.754 --> 00:16:34.130
-- that's the important thing here -- and it can be shown that if you reorder rows and columns properly,

00:16:34.130 --> 00:16:40.037
you end up with this matrix that we see on the right side which is actually a circulant matrix.

00:16:40.037 --> 00:16:43.153
So you can see that the 2nd row is simply the 1st row,

00:16:43.153 --> 00:16:46.738
circularly shifted, and so on, to the bottom.

00:16:46.738 --> 00:16:52.400
And such a circulant matrix, we will see, can be diagonalized by a DFT of size 4,

00:16:52.400 --> 00:16:56.337
so we are back to computing a DFT but now of a size 1 less.

00:16:58.121 --> 00:17:01.033
When N is a product of 2 coprime numbers.

00:17:01.033 --> 00:17:02.719
Here, we take N=6.

00:17:02.719 --> 00:17:07.635
It can be written as a 2-dimensional DFT of size 3 by 2.

00:17:07.635 --> 00:17:14.598
And such a DFT can be computed by having 3 DFTs of size 2, and 2 DFTs of size 3.

00:17:14.598 --> 00:17:16.789
We write this here in a factorized form,

00:17:16.789 --> 00:17:23.818
so we have permutation matrices R, Q and P and we have block diagonal matrices.

00:17:23.818 --> 00:17:28.914
The first one on the left is block diagonal with DF-- three DFTs of size 2.

00:17:28.914 --> 00:17:33.735
The block diagonal matrix on the right has two DFTs of size three.

00:17:33.735 --> 00:17:39.139
Except for permutations, there was no cost in transforming a DFT of size 6

00:17:39.139 --> 00:17:44.766
into 3 DFTs of size 2, and 2 DFTs of size 3, so this is very efficient.

00:17:46.658 --> 00:17:49.010
The conclusion is don't worry be happy:

00:17:49.010 --> 00:17:53.849
any length of DFT can be mapped into a Fast Fourier Transform algorithm.

00:17:53.849 --> 00:17:58.175
Therefore this is an important message because you don't have to take your signal,

00:17:58.175 --> 00:18:02.568
zero-pad so as to get a length that is a power of 2, for example.

00:18:02.568 --> 00:18:08.316
This is often done but it's a mistake, if you really want to compute a DFT of size 17,

00:18:08.316 --> 00:18:11.957
then there is an algorithm that will compute it and it will compute it fast.

00:18:11.957 --> 00:18:15.656
I also want to say that there are extremely good packages out there,

00:18:15.656 --> 00:18:22.018
something called, for example, the Fastest Fourier Transform in the West, or the SPIRAL program.

00:18:22.018 --> 00:18:26.072
And these packages allow you to not worry,

00:18:26.072 --> 00:18:30.139
and simply find the best possible Fourier Transform for any length.

00:18:30.139 --> 00:18:33.705
And as we had seen before, this makes a big difference.

00:18:33.705 --> 00:18:37.275
We told-- we are talking about orders of magnitude in speed up.

00:18:38.936 --> 00:18:41.932
Let me conclude this module on the Fourier Transform.

00:18:41.932 --> 00:18:44.274
The Fourier transform is a magical tool.

00:18:44.274 --> 00:18:48.511
It's a beautiful mathematical tool, it is also a very practical tool.

00:18:48.511 --> 00:18:52.047
We have seen the DTFD, which is good for math.

00:18:52.047 --> 00:18:55.234
We have seen the DFT, which is good for computation.

00:18:55.234 --> 00:18:57.302
And we have seen the Fast Fourier Transform,

00:18:57.302 --> 00:19:00.590
which allows to compute Fourier transforms very rapidly.

00:19:00.590 --> 00:19:06.517
This fascination with Fourier can be seen here in a tribute on a license plate.