4.9b - FFT: history and algorithms
-
0:01 - 0:06In this second half of module 4.9, we are going to look specifically
-
0:06 - 0:12at one example of a Divide and Conquer algorithm, it's the Cooley-Turkey Fast Fourier Transform,
-
0:12 - 0:16we will see a matrix view of this algorithm, as well as flow-graphs,
-
0:16 - 0:21and then conclude by looking at more general cases of Fast Fourier Transforms.
-
0:22 - 0:26So far, we have only had conceptual discussions
-
0:26 - 0:29of what the Divide and Conquer algorithm would be.
-
0:29 - 0:35We now need to be concrete and actually develop an algorithm for the Discrete Fourier Transform,
-
0:35 - 0:37based on the idea of Divide and Conquer.
-
0:37 - 0:42We'll do this by analyzing an algorithm called the decimation-in-time algorithm.
-
0:42 - 0:46It'll take us a number of slides until we get to all the details of it.
-
0:46 - 0:47But this is very important,
-
0:47 - 0:54because you're going to understand the mechanics of one of the most famous algorithms of all times.
-
0:54 - 1:00Start with the definition of the DFT, so X[k] is the sum from 0 to N-1
-
1:00 - 1:04of (?) x[n], multiplied by the root of unity to the power nk.
-
1:04 - 1:09k goes from 0 to N -1, W is the usual Nth root of unity.
-
1:09 - 1:15Now, as very often, a good guess is half of the answer to a fast algorithm.
-
1:15 - 1:21The good guess is the following: break the input into even and odd index terms,
-
1:21 - 1:23that's why it's called decimation-in-time,
-
1:23 - 1:29because it's like taking the sequence and throwing out half of the entries,
-
1:29 - 1:31that's decimation by a factor of 2.
-
1:31 - 1:39And so, x[n] is mapped into x[2n], the even index samples, and x[2n] + 1, the odd index samples,
-
1:39 - 1:43and now n goes from 0 to N/2 - 1.
-
1:43 - 1:48Recall that N is assumed to be a power of 2, so N/2 is again an integer.
-
1:48 - 1:51Then, we need to do something similar to the output.
-
1:51 - 1:55But there it's not even and odd, it's first half and second half.
-
1:55 - 2:01So, X[k], k going from 0 to N minus 1, becomes the first half.
-
2:01 - 2:09X[k] plus the second half X[k+N/2], where now k runs from 0 to N/2 -1.
-
2:11 - 2:14Let us now look at the equations for the DFT,
-
2:14 - 2:17and use this separation of the inputs into even and odd inputs.
-
2:17 - 2:24So X[k] now is the sum of 2 sums of length N/2,
-
2:24 - 2:30over the even and odd ones, with respective powers of W.
-
2:30 - 2:34And we separate these two sums, we simplify the exponent,
-
2:34 - 2:37and then we need to use a little trick here.
-
2:37 - 2:39The trick is the following:
-
2:39 - 2:43If you look at Nth root of unity, you take its power of 2,
-
2:43 - 2:53then this is equal to e to the -j4π/N, which is equal to e to the -j2π divided by N/2.
-
2:53 - 2:57And this of course is the root of unity of order N/2.
-
2:57 - 3:02Using this, we can write W to the power 2nk
-
3:02 - 3:08as W of root N/2, to the power nk.
-
3:08 - 3:19We use this and we see now that we have 2 sums of lengths N/2 with respect to the N/2th root of unity
-
3:19 - 3:25and we write this as the sum of X' k and X'' k,
-
3:25 - 3:29where X'' k is actually multiplied by W to the k.
-
3:29 - 3:32Here W is still the nth (?) root of unity.
-
3:34 - 3:38We can now say this explicitly in the following way:
-
3:38 - 3:43we have 2 half-size DFTs, which we call X', X''.
-
3:43 - 3:47We have to multiply the second DFT by W k.
-
3:47 - 3:53That will require N/2 complex multiplications and we add the results together.
-
3:53 - 3:57With this, we have found the first half of our result, namely X[k],
-
3:57 - 4:02for k going from zero to N/2 -1.
-
4:02 - 4:07Now we need to find out how to compute the second half of the DFT, and then we'll be done.
-
4:08 - 4:12We have our first half of the DFT: let us consider the second half,
-
4:12 - 4:16So that's X[k+N/2].
-
4:16 - 4:22This is made up of two sums over the even and the odd inputs.
-
4:22 - 4:29And the sum of the odd inputs is pre-multiplied by W to the power k + N/2.
-
4:29 - 4:36We need again a little trick and this is to simplify W to the power N/2,
-
4:36 - 4:46this is equal to e to the - j Nπ/N, this is e to the -jπ, which is equal to -1.
-
4:47 - 4:49With this, we see that
-
4:49 - 4:56the second sum is simply premultiplied by W to the k and a change of sign,
-
4:56 - 5:06so we get as a net result that the second half of the DFT is actually equal to X' - Wk X''
-
5:06 - 5:09So it's the same two parts we had before.
-
5:09 - 5:12We simply -- subtraction inside of a sum.
-
5:15 - 5:18In words: we have now computed both parts.
-
5:18 - 5:21So we divide the input into even and odd samples,
-
5:21 - 5:25compute 2 DFTs of size N/2.
-
5:25 - 5:33We multiply this output of the second DFT by the roots of unity of order n (?) raised to the kth power.
-
5:33 - 5:36That uses capital N/2 multiplications.
-
5:36 - 5:43Then we recover the outputs by taking the sum and the difference of these 2 parts,
-
5:43 - 5:48and that will give us X[k] and X[k+N/2].
-
5:49 - 5:56It is easiest to see this in a diagram, and we do the diagram for a DFT of size 8.
-
5:56 - 5:58So we have x[0] to x[7].
-
5:58 - 6:04They are divided into the even parts, the upper half, the odd part, the lower half.
-
6:04 - 6:07And we compute 2 DFTs of size N/2, in this case,
-
6:07 - 6:10that would be DFT's of size 4.
-
6:10 - 6:17Then we simply recombine the respective outputs by sum and differences,
-
6:17 - 6:23and we get X[0], which is a sum of the first output of both DFTs.
-
6:23 - 6:31The difference which will give you the X[4] output, which is the difference of these first outputs, etc.
-
6:31 - 6:34And with this, we have first half, second half.
-
6:34 - 6:40In this very simple diagram, we see very explicitly this Divide and Conquer algorithm at work.
-
6:43 - 6:45So what is the complexity now?
-
6:45 - 6:50So to split the DFT into two pieces of size N/2, that was free.
-
6:50 - 6:54You simply had to select even and odd index inputs.
-
6:54 - 6:56Then we compute two DFT's of size N/2.
-
6:56 - 7:01Recall that the complexity's quadratic, so we have this N square over 2 complexity here.
-
7:01 - 7:07We merge the two results with multiplications: there are N/2 of them.
-
7:07 - 7:12And the total complexity, at this stage, is N/2 + N/2 (?),
-
7:12 - 7:18which, indeed, is smaller than the original complexity, which is quadratic in N.
-
7:18 - 7:21So, in general, we have about half the complexity of the initial problem.
-
7:21 - 7:26Now, this would not be a big deal: that's a factor of 2 of savings, nice to have.
-
7:26 - 7:29But we are aiming for much more than this.
-
7:31 - 7:33What about repeating this process?
-
7:33 - 7:38We know that, since N is a power of 2, we can divide in 2, 4, 8, etcetera.
-
7:39 - 7:43And we can go until we end up with trivial DFTs of size 2.
-
7:43 - 7:46These simply amount to a sum and a difference.
-
7:46 - 7:52This will require log N minus 1 (?) steps, because we end up with a DFT of size 2.
-
7:53 - 8:00Each step requires a merger and uses an order N/2 multiplications and order N additions,
-
8:00 - 8:08so the total will be N/2 (log-in-base-2 N-1) multiplications and N log-in-base-2 N additions.
-
8:09 - 8:14So the saving is of order log-in-base-2 N divided by N.
-
8:14 - 8:19The key result, the "take home" message here, is that a DFT of size N
-
8:19 - 8:23requires order N log-in-base-2 N operations.
-
8:23 - 8:27Let us look at a few examples to see how big the savings is.
-
8:27 - 8:32So when N is equal to 1024, that's 2 to the power of 10,
-
8:32 - 8:35so the saving is of the order of factor 100.
-
8:36 - 8:41If N is 2 to the power of 20, which is roughly a million, a little bit more,
-
8:41 - 8:46then the saving is of the order of 50,000: so that's a big deal, indeed.
-
8:48 - 8:54We have seen a recursive formulation using sums on even and odd indexed inputs.
-
8:54 - 8:57We have seen a blog diagram (?) view of the DFT.
-
8:57 - 9:01Let's look now at a matrix factorization view.
-
9:01 - 9:05This is important because many implementations, for example in Matlab or FreeMat
-
9:05 - 9:08will actually use numerical linear algebra,
-
9:08 - 9:12and then the matrix factorization is actually important.
-
9:12 - 9:19So in this case we look at W4, the size 4 DFT, so recall you have to separate even and odd samples,
-
9:19 - 9:21that's the 1st matrix on the right.
-
9:21 - 9:24Then we compute two DFT's of size 2.
-
9:24 - 9:28These are sum and differences, it's a block diagonal matrix.
-
9:28 - 9:32Then we have to multiply the odd outputs by W k:
-
9:33 - 9:39here, these are simply j's and minus j, and sum and difference, which is included in here,
-
9:39 - 9:44and we get the overall result, with these three small matrices.
-
9:44 - 9:48This uses eight additions, no multiplications.
-
9:50 - 9:52Let's now move to the next bigger size.
-
9:52 - 9:54This is N=8.
-
9:54 - 9:58Now this is going to be big because each matrix is of size 8 by 8,
-
9:58 - 10:01and we have a factorization to take care of,
-
10:01 - 10:06so the first matrix W8 in its initial form we have seen before.
-
10:08 - 10:12The first step is to separate even from odd indexed samples.
-
10:12 - 10:16Call this D 8 for decimation of size 8.
-
10:16 - 10:20You see the matrix here, you can sort of recognize it's structure,
-
10:20 - 10:26so the 1st half takes out the even inputs, the 2nd half takes out the odd inputs.
-
10:26 - 10:31No arithmetic operations so far, just re-indexing of inputs.
-
10:33 - 10:39The step 2 is to compute two DFTs of size N/2, on these even and odd index samples.
-
10:39 - 10:46So, each submatrix is a DFT of size 4 and the result is a block diagonal matrix.
-
10:46 - 10:52On the diagonal, you have the W4 matrices, off the diagonal, you have the 0 matrix.
-
10:52 - 10:56This requires 2 DFTs of size 4, according to what we have just seen,
-
10:56 - 10:58that's a total of 16 additions.
-
11:00 - 11:06Step 3 is to multiply the output of the 2nd DFT by W to the k.
-
11:06 - 11:12This will be a diagonal matrix with an identity matrix in the 1st half of the diagonal
-
11:12 - 11:16and the 2nd half is made up of the successive powers of W.
-
11:16 - 11:23This will require 2 multiplications because W square is actually -j, and that's free.
-
11:23 - 11:26So we can summarize this as a block matrix
-
11:26 - 11:33with I4, Λ4 on the diagonal, 04 which are the zero matrices, off the diagonal.
-
11:33 - 11:39And the Λ4 is the one that contains the complex multiplications by the roots of unity
-
11:41 - 11:44Step 4: we have to recombine the final output
-
11:44 - 11:49to obtain X[k] and X[k+N/2] by a sum and difference.
-
11:49 - 11:55So this matrix is also very structured. The first half computes the sum.
-
11:55 - 12:00So we can put this in block matrix form by having two identity matrices of size 4.
-
12:00 - 12:03And the second half computes the differences,
-
12:03 - 12:09so that's the identity matrix of size 4 minus itself: that requires 8 additions.
-
12:11 - 12:16So in total, we have the product of 4 matrices, D8, the selector matrix,
-
12:16 - 12:20then the one that is diagonal with the DFTs of size 4,
-
12:20 - 12:26then the one that is diagonal with the I4, Λ4, and finally the sum and differences.
-
12:26 - 12:30So the total will be 24 additions and 2 multiplications.
-
12:30 - 12:35Now this is much smaller than what we would expect from an N square complexity.
-
12:35 - 12:40N square would be in this case 64 order, 64 additions and multiplications.
-
12:40 - 12:43So you see that the big saving here is multiplications.
-
12:43 - 12:47That is the case here because the DFT is very small
-
12:47 - 12:51and so it quickly comes down to trivial DFT's.
-
12:53 - 12:57We can write out what we just did in an explicit flow-graph.
-
12:57 - 13:04Such a flow-graph simply indicates how inputs -- x[0] to x[7| -- go through the flow-graph,
-
13:04 - 13:10get multiplied by the roots of unity when needed, then take sum and differences etc.
-
13:10 - 13:15And as you go from left to right, you end up with the output X[0] to X[7].
-
13:15 - 13:22What you can recognize visually are the butterflies, which are simply the sum and differences
-
13:22 - 13:27that appear again and again in the recursive computation of the DFT
-
13:27 - 13:30using the Fast Fourier Transform algorithm.
-
13:32 - 13:34Now, is this a big deal?
-
13:34 - 13:39In image processing, so, for example, on your digital camera or on your mobile phone,
-
13:39 - 13:43the image is taken in small blocks of 8 by 8.
-
13:43 - 13:47One computes a transform that is called a Discrete Cosine Transform,
-
13:47 - 13:49we shall see this later in this class
-
13:49 - 13:52when we study the JPEG compression algorithm,
-
13:52 - 13:56and the DCT is a little bit similar to a DFT, it's like a real DFT.
-
13:56 - 14:01It has a fast algorithm that is very much inspired by what we just saw.
-
14:03 - 14:09Let's compare direct multiplication: so 8 by 8 is 64 inputs;
-
14:09 - 14:16squared, that's 4000 multiplications; it's a dominant cost because it's, it's fixed point multiplications.
-
14:16 - 14:20The transform can be computed in rows and columns separately,
-
14:20 - 14:23so you have 16 DFT's, essentially.
-
14:23 - 14:27We have seen that such an algorithm would take 2 multiplications,
-
14:27 - 14:30so we would have a total of 32 multiplications.
-
14:30 - 14:34So we have a saving of the order of 2 orders of magnitude.
-
14:34 - 14:36And that's a big deal.
-
14:36 - 14:41So, instead of compressing your image in 100 seconds, it will take 1 second, for example.
-
14:43 - 14:47So far, we have only considered lengths N which were powers of 2.
-
14:47 - 14:49But of course, other lengths are also important,
-
14:49 - 14:54and the message here is that for every possible length N, there is a fast algorithm.
-
14:54 - 14:57So the first case is N is equal to a power of a prime
-
14:57 - 15:02We looked (?) at the case p is equal to 2, and we will derive the Cooley-Tukey algorithm.
-
15:02 - 15:05Very famous, very simple and very important.
-
15:05 - 15:08When N is a product of 2 coprime numbers,
-
15:08 - 15:14then there is a way to reorder this into a two-dimensional DFT of size N1 by N2,
-
15:14 - 15:18this is known as the Good-Thomas algorithm, it was invented in the 50s
-
15:18 - 15:21and reinvented in the 60s again.
-
15:21 - 15:25When N is a prime, we can do a mapping of the Transform
-
15:25 - 15:28that results in a convolution of size N-1.
-
15:28 - 15:30This might not be good news,
-
15:30 - 15:36except that we'll see that the convolution of size N-1 can also be solved with a DFT,
-
15:36 - 15:40and so if N is, for example, 17, then N-1 is 16,
-
15:40 - 15:44and this we can solve with a usual Cooley-Tukey fast algorithm.
-
15:44 - 15:46By combining all the tricks involved,
-
15:46 - 15:50minimizing the number of multiplications, reordering the terms,
-
15:50 - 15:54we have the so-called Winograd Fast Fourier Transform,
-
15:54 - 15:59which was invented by Winograd in the late 1970's,
-
15:59 - 16:04and which really caps the work on fast algorithm development from a theoretical point of view.
-
16:06 - 16:11Let's us look at this case N=5 that we had seen before: the matrix is familiar.
-
16:11 - 16:17We take out one piece of it and we remove the columns of 1 and the row of 1,
-
16:17 - 16:24and we are left with a matrix that has, of course, all powers of W from 1 to 5,
-
16:24 - 16:27but each power appears only once
-
16:27 - 16:34-- that's the important thing here -- and it can be shown that if you reorder rows and columns properly,
-
16:34 - 16:40you end up with this matrix that we see on the right side which is actually a circulant matrix.
-
16:40 - 16:43So you can see that the 2nd row is simply the 1st row,
-
16:43 - 16:47circularly shifted, and so on, to the bottom.
-
16:47 - 16:52And such a circulant matrix, we will see, can be diagonalized by a DFT of size 4,
-
16:52 - 16:56so we are back to computing a DFT but now of a size 1 less.
-
16:58 - 17:01When N is a product of 2 coprime numbers.
-
17:01 - 17:03Here, we take N=6.
-
17:03 - 17:08It can be written as a 2-dimensional DFT of size 3 by 2.
-
17:08 - 17:15And such a DFT can be computed by having 3 DFTs of size 2, and 2 DFTs of size 3.
-
17:15 - 17:17We write this here in a factorized form,
-
17:17 - 17:24so we have permutation matrices R, Q and P and we have block diagonal matrices.
-
17:24 - 17:29The first one on the left is block diagonal with DF-- three DFTs of size 2.
-
17:29 - 17:34The block diagonal matrix on the right has two DFTs of size three.
-
17:34 - 17:39Except for permutations, there was no cost in transforming a DFT of size 6
-
17:39 - 17:45into 3 DFTs of size 2, and 2 DFTs of size 3, so this is very efficient.
-
17:47 - 17:49The conclusion is don't worry be happy:
-
17:49 - 17:54any length of DFT can be mapped into a Fast Fourier Transform algorithm.
-
17:54 - 17:58Therefore this is an important message because you don't have to take your signal,
-
17:58 - 18:03zero-pad so as to get a length that is a power of 2, for example.
-
18:03 - 18:08This is often done but it's a mistake, if you really want to compute a DFT of size 17,
-
18:08 - 18:12then there is an algorithm that will compute it and it will compute it fast.
-
18:12 - 18:16I also want to say that there are extremely good packages out there,
-
18:16 - 18:22something called, for example, the Fastest Fourier Transform in the West, or the SPIRAL program.
-
18:22 - 18:26And these packages allow you to not worry,
-
18:26 - 18:30and simply find the best possible Fourier Transform for any length.
-
18:30 - 18:34And as we had seen before, this makes a big difference.
-
18:34 - 18:37We told-- we are talking about orders of magnitude in speed up.
-
18:39 - 18:42Let me conclude this module on the Fourier Transform.
-
18:42 - 18:44The Fourier transform is a magical tool.
-
18:44 - 18:49It's a beautiful mathematical tool, it is also a very practical tool.
-
18:49 - 18:52We have seen the DTFD, which is good for math.
-
18:52 - 18:55We have seen the DFT, which is good for computation.
-
18:55 - 18:57And we have seen the Fast Fourier Transform,
-
18:57 - 19:01which allows to compute Fourier transforms very rapidly.
-
19:01 - 19:07This fascination with Fourier can be seen here in a tribute on a license plate.
- Title:
- 4.9b - FFT: history and algorithms
- Description:
-
See "Chapter 4 Fourier Analysis" in "Signal Processing for Communications" by Paolo Prandoni and Martin Vetterli, available from http://www.sp4comm.org/ , and the slides for the entire module 4 of the Digital Signal Processing Coursera course, in https://spark-public.s3.amazonaws.com/dsp/slides/module4-0.pdf .
And if you are registered for this Coursera course, see https://class.coursera.org/dsp-001/wiki/view?page=week3
- Video Language:
- English
Claude Almansi edited English subtitles for 4.9b - FFT: history and algorithms | ||
Claude Almansi edited English subtitles for 4.9b - FFT: history and algorithms | ||
Claude Almansi edited English subtitles for 4.9b - FFT: history and algorithms | ||
Claude Almansi edited English subtitles for 4.9b - FFT: history and algorithms | ||
Claude Almansi edited English subtitles for 4.9b - FFT: history and algorithms | ||
Claude Almansi edited English subtitles for 4.9b - FFT: history and algorithms | ||
Claude Almansi edited English subtitles for 4.9b - FFT: history and algorithms | ||
Claude Almansi edited English subtitles for 4.9b - FFT: history and algorithms |