WEBVTT 00:00:00.872 --> 00:00:06.166 In this second half of module 4.9, we are going to look specifically 00:00:06.166 --> 00:00:11.928 at one example of a Divide and Conquer algorithm, it's the Cooley-Turkey Fast Fourier Transform, 00:00:11.928 --> 00:00:15.967 we will see a matrix view of this algorithm, as well as flow-graphs, 00:00:15.967 --> 00:00:21.047 and then conclude by looking at more general cases of Fast Fourier Transforms. 00:00:22.200 --> 00:00:25.577 So far, we have only had conceptual discussions 00:00:25.577 --> 00:00:28.596 of what the Divide and Conquer algorithm would be. 00:00:28.596 --> 00:00:34.542 We now need to be concrete and actually develop an algorithm for the Discrete Fourier Transform, 00:00:34.542 --> 00:00:36.583 based on the idea of Divide and Conquer. 00:00:36.583 --> 00:00:41.645 We'll do this by analyzing an algorithm called the decimation-in-time algorithm. 00:00:41.645 --> 00:00:45.966 It'll take us a number of slides until we get to all the details of it. 00:00:45.966 --> 00:00:47.293 But this is very important, 00:00:47.293 --> 00:00:53.864 because you're going to understand the mechanics of one of the most famous algorithms of all times. 00:00:53.864 --> 00:01:00.471 Start with the definition of the DFT, so X[k] is the sum from 0 to N-1 00:01:00.472 --> 00:01:04.193 of (?) x[n], multiplied by the root of unity to the power nk. 00:01:04.193 --> 00:01:09.203 k goes from 0 to N -1, W is the usual Nth root of unity. 00:01:09.203 --> 00:01:15.209 Now, as very often, a good guess is half of the answer to a fast algorithm. 00:01:15.209 --> 00:01:20.533 The good guess is the following: break the input into even and odd index terms, 00:01:20.533 --> 00:01:22.996 that's why it's called decimation-in-time, 00:01:22.996 --> 00:01:28.782 because it's like taking the sequence and throwing out half of the entries, 00:01:28.782 --> 00:01:31.298 that's decimation by a factor of 2. 00:01:31.298 --> 00:01:38.930 And so, x[n] is mapped into x[2n], the even index samples, and x[2n] + 1, the odd index samples, 00:01:38.930 --> 00:01:42.986 and now n goes from 0 to N/2 - 1. 00:01:42.986 --> 00:01:48.120 Recall that N is assumed to be a power of 2, so N/2 is again an integer. 00:01:48.120 --> 00:01:51.083 Then, we need to do something similar to the output. 00:01:51.083 --> 00:01:55.084 But there it's not even and odd, it's first half and second half. 00:01:55.084 --> 00:02:00.581 So, X[k], k going from 0 to N minus 1, becomes the first half. 00:02:00.581 --> 00:02:08.768 X[k] plus the second half X[k+N/2], where now k runs from 0 to N/2 -1. 00:02:10.813 --> 00:02:13.554 Let us now look at the equations for the DFT, 00:02:13.554 --> 00:02:17.438 and use this separation of the inputs into even and odd inputs. 00:02:17.454 --> 00:02:24.266 So X[k] now is the sum of 2 sums of length N/2, 00:02:24.266 --> 00:02:29.997 over the even and odd ones, with respective powers of W. 00:02:29.997 --> 00:02:33.804 And we separate these two sums, we simplify the exponent, 00:02:33.804 --> 00:02:36.926 and then we need to use a little trick here. 00:02:36.926 --> 00:02:38.755 The trick is the following: 00:02:38.771 --> 00:02:42.999 If you look at Nth root of unity, you take its power of 2, 00:02:42.999 --> 00:02:52.944 then this is equal to e to the -j4π/N, which is equal to e to the -j2π divided by N/2. 00:02:52.944 --> 00:02:57.189 And this of course is the root of unity of order N/2. 00:02:57.189 --> 00:03:02.188 Using this, we can write W to the power 2nk 00:03:02.188 --> 00:03:08.088 as W of root N/2, to the power nk. 00:03:08.088 --> 00:03:18.604 We use this and we see now that we have 2 sums of lengths N/2 with respect to the N/2th root of unity 00:03:18.604 --> 00:03:24.612 and we write this as the sum of X' k and X'' k, 00:03:24.612 --> 00:03:28.844 where X'' k is actually multiplied by W to the k. 00:03:28.844 --> 00:03:31.782 Here W is still the nth (?) root of unity. 00:03:34.197 --> 00:03:37.598 We can now say this explicitly in the following way: 00:03:37.598 --> 00:03:42.775 we have 2 half-size DFTs, which we call X', X''. 00:03:43.282 --> 00:03:46.916 We have to multiply the second DFT by W k. 00:03:46.916 --> 00:03:52.624 That will require N/2 complex multiplications and we add the results together. 00:03:52.624 --> 00:03:57.290 With this, we have found the first half of our result, namely X[k], 00:03:57.290 --> 00:04:01.573 for k going from zero to N/2 -1. 00:04:01.573 --> 00:04:06.896 Now we need to find out how to compute the second half of the DFT, and then we'll be done. 00:04:08.357 --> 00:04:12.356 We have our first half of the DFT: let us consider the second half, 00:04:12.356 --> 00:04:15.885 So that's X[k+N/2]. 00:04:15.885 --> 00:04:21.940 This is made up of two sums over the even and the odd inputs. 00:04:21.940 --> 00:04:28.726 And the sum of the odd inputs is pre-multiplied by W to the power k + N/2. 00:04:28.726 --> 00:04:35.667 We need again a little trick and this is to simplify W to the power N/2, 00:04:35.667 --> 00:04:46.152 this is equal to e to the - j Nπ/N, this is e to the -jπ, which is equal to -1. 00:04:46.628 --> 00:04:49.022 With this, we see that 00:04:49.022 --> 00:04:56.035 the second sum is simply premultiplied by W to the k and a change of sign, 00:04:56.035 --> 00:05:05.566 so we get as a net result that the second half of the DFT is actually equal to X' - Wk X'' 00:05:05.566 --> 00:05:08.564 So it's the same two parts we had before. 00:05:08.564 --> 00:05:12.425 We simply -- subtraction inside of a sum. 00:05:14.593 --> 00:05:18.112 In words: we have now computed both parts. 00:05:18.112 --> 00:05:21.168 So we divide the input into even and odd samples, 00:05:21.168 --> 00:05:24.509 compute 2 DFTs of size N/2. 00:05:24.718 --> 00:05:33.080 We multiply this output of the second DFT by the roots of unity of order n (?) raised to the kth power. 00:05:33.080 --> 00:05:35.843 That uses capital N/2 multiplications. 00:05:36.381 --> 00:05:43.084 Then we recover the outputs by taking the sum and the difference of these 2 parts, 00:05:43.084 --> 00:05:47.507 and that will give us X[k] and X[k+N/2]. 00:05:49.413 --> 00:05:55.700 It is easiest to see this in a diagram, and we do the diagram for a DFT of size 8. 00:05:55.700 --> 00:05:58.378 So we have x[0] to x[7]. 00:05:58.378 --> 00:06:03.673 They are divided into the even parts, the upper half, the odd part, the lower half. 00:06:03.673 --> 00:06:07.273 And we compute 2 DFTs of size N/2, in this case, 00:06:07.273 --> 00:06:09.686 that would be DFT's of size 4. 00:06:09.686 --> 00:06:16.584 Then we simply recombine the respective outputs by sum and differences, 00:06:16.584 --> 00:06:23.335 and we get X[0], which is a sum of the first output of both DFTs. 00:06:23.335 --> 00:06:30.825 The difference which will give you the X[4] output, which is the difference of these first outputs, etc. 00:06:30.825 --> 00:06:33.950 And with this, we have first half, second half. 00:06:33.950 --> 00:06:40.462 In this very simple diagram, we see very explicitly this Divide and Conquer algorithm at work. 00:06:42.614 --> 00:06:44.579 So what is the complexity now? 00:06:44.579 --> 00:06:50.011 So to split the DFT into two pieces of size N/2, that was free. 00:06:50.011 --> 00:06:53.787 You simply had to select even and odd index inputs. 00:06:53.787 --> 00:06:56.342 Then we compute two DFT's of size N/2. 00:06:56.342 --> 00:07:01.371 Recall that the complexity's quadratic, so we have this N square over 2 complexity here. 00:07:01.371 --> 00:07:06.799 We merge the two results with multiplications: there are N/2 of them. 00:07:06.799 --> 00:07:12.386 And the total complexity, at this stage, is N/2 + N/2 (?), 00:07:12.386 --> 00:07:17.546 which, indeed, is smaller than the original complexity, which is quadratic in N. 00:07:17.546 --> 00:07:21.030 So, in general, we have about half the complexity of the initial problem. 00:07:21.030 --> 00:07:26.155 Now, this would not be a big deal: that's a factor of 2 of savings, nice to have. 00:07:26.155 --> 00:07:28.556 But we are aiming for much more than this. 00:07:31.170 --> 00:07:33.006 What about repeating this process? 00:07:33.006 --> 00:07:38.271 We know that, since N is a power of 2, we can divide in 2, 4, 8, etcetera. 00:07:38.655 --> 00:07:42.905 And we can go until we end up with trivial DFTs of size 2. 00:07:42.905 --> 00:07:45.919 These simply amount to a sum and a difference. 00:07:45.919 --> 00:07:51.676 This will require log N minus 1 (?) steps, because we end up with a DFT of size 2. 00:07:52.506 --> 00:08:00.175 Each step requires a merger and uses an order N/2 multiplications and order N additions, 00:08:00.175 --> 00:08:08.477 so the total will be N/2 (log-in-base-2 N-1) multiplications and N log-in-base-2 N additions. 00:08:08.754 --> 00:08:13.917 So the saving is of order log-in-base-2 N divided by N. 00:08:13.917 --> 00:08:19.038 The key result, the "take home" message here, is that a DFT of size N 00:08:19.038 --> 00:08:22.871 requires order N log-in-base-2 N operations. 00:08:22.871 --> 00:08:27.041 Let us look at a few examples to see how big the savings is. 00:08:27.041 --> 00:08:31.921 So when N is equal to 1024, that's 2 to the power of 10, 00:08:31.921 --> 00:08:35.393 so the saving is of the order of factor 100. 00:08:35.839 --> 00:08:41.380 If N is 2 to the power of 20, which is roughly a million, a little bit more, 00:08:41.380 --> 00:08:46.207 then the saving is of the order of 50,000: so that's a big deal, indeed. 00:08:48.083 --> 00:08:54.318 We have seen a recursive formulation using sums on even and odd indexed inputs. 00:08:54.318 --> 00:08:57.395 We have seen a blog diagram (?) view of the DFT. 00:08:57.395 --> 00:09:00.629 Let's look now at a matrix factorization view. 00:09:00.629 --> 00:09:05.152 This is important because many implementations, for example in Matlab or FreeMat 00:09:05.152 --> 00:09:08.288 will actually use numerical linear algebra, 00:09:08.304 --> 00:09:11.527 and then the matrix factorization is actually important. 00:09:11.527 --> 00:09:19.388 So in this case we look at W4, the size 4 DFT, so recall you have to separate even and odd samples, 00:09:19.388 --> 00:09:21.101 that's the 1st matrix on the right. 00:09:21.101 --> 00:09:24.137 Then we compute two DFT's of size 2. 00:09:24.137 --> 00:09:27.729 These are sum and differences, it's a block diagonal matrix. 00:09:27.729 --> 00:09:32.131 Then we have to multiply the odd outputs by W k: 00:09:32.516 --> 00:09:39.304 here, these are simply j's and minus j, and sum and difference, which is included in here, 00:09:39.304 --> 00:09:44.420 and we get the overall result, with these three small matrices. 00:09:44.432 --> 00:09:47.858 This uses eight additions, no multiplications. 00:09:49.872 --> 00:09:52.224 Let's now move to the next bigger size. 00:09:52.224 --> 00:09:53.913 This is N=8. 00:09:53.913 --> 00:09:58.465 Now this is going to be big because each matrix is of size 8 by 8, 00:09:58.465 --> 00:10:01.173 and we have a factorization to take care of, 00:10:01.173 --> 00:10:06.246 so the first matrix W8 in its initial form we have seen before. 00:10:08.368 --> 00:10:12.074 The first step is to separate even from odd indexed samples. 00:10:12.074 --> 00:10:15.781 Call this D 8 for decimation of size 8. 00:10:15.781 --> 00:10:19.535 You see the matrix here, you can sort of recognize it's structure, 00:10:19.535 --> 00:10:25.655 so the 1st half takes out the even inputs, the 2nd half takes out the odd inputs. 00:10:25.655 --> 00:10:30.757 No arithmetic operations so far, just re-indexing of inputs. 00:10:33.004 --> 00:10:38.913 The step 2 is to compute two DFTs of size N/2, on these even and odd index samples. 00:10:38.913 --> 00:10:45.609 So, each submatrix is a DFT of size 4 and the result is a block diagonal matrix. 00:10:45.609 --> 00:10:51.649 On the diagonal, you have the W4 matrices, off the diagonal, you have the 0 matrix. 00:10:51.649 --> 00:10:55.905 This requires 2 DFTs of size 4, according to what we have just seen, 00:10:55.905 --> 00:10:58.198 that's a total of 16 additions. 00:11:00.075 --> 00:11:05.798 Step 3 is to multiply the output of the 2nd DFT by W to the k. 00:11:05.798 --> 00:11:11.545 This will be a diagonal matrix with an identity matrix in the 1st half of the diagonal 00:11:11.545 --> 00:11:16.366 and the 2nd half is made up of the successive powers of W. 00:11:16.366 --> 00:11:22.980 This will require 2 multiplications because W square is actually -j, and that's free. 00:11:22.980 --> 00:11:25.916 So we can summarize this as a block matrix 00:11:25.916 --> 00:11:32.545 with I4, Λ4 on the diagonal, 04 which are the zero matrices, off the diagonal. 00:11:32.545 --> 00:11:38.550 And the Λ4 is the one that contains the complex multiplications by the roots of unity 00:11:40.796 --> 00:11:44.191 Step 4: we have to recombine the final output 00:11:44.191 --> 00:11:49.324 to obtain X[k] and X[k+N/2] by a sum and difference. 00:11:49.324 --> 00:11:55.227 So this matrix is also very structured. The first half computes the sum. 00:11:55.228 --> 00:12:00.458 So we can put this in block matrix form by having two identity matrices of size 4. 00:12:00.458 --> 00:12:02.805 And the second half computes the differences, 00:12:02.805 --> 00:12:08.617 so that's the identity matrix of size 4 minus itself: that requires 8 additions. 00:12:10.926 --> 00:12:15.517 So in total, we have the product of 4 matrices, D8, the selector matrix, 00:12:15.517 --> 00:12:19.555 then the one that is diagonal with the DFTs of size 4, 00:12:19.555 --> 00:12:25.614 then the one that is diagonal with the I4, Λ4, and finally the sum and differences. 00:12:25.614 --> 00:12:30.398 So the total will be 24 additions and 2 multiplications. 00:12:30.398 --> 00:12:34.747 Now this is much smaller than what we would expect from an N square complexity. 00:12:34.747 --> 00:12:39.931 N square would be in this case 64 order, 64 additions and multiplications. 00:12:39.931 --> 00:12:43.488 So you see that the big saving here is multiplications. 00:12:43.488 --> 00:12:47.038 That is the case here because the DFT is very small 00:12:47.038 --> 00:12:50.597 and so it quickly comes down to trivial DFT's. 00:12:52.581 --> 00:12:56.706 We can write out what we just did in an explicit flow-graph. 00:12:56.706 --> 00:13:03.873 Such a flow-graph simply indicates how inputs -- x[0] to x[7| -- go through the flow-graph, 00:13:03.873 --> 00:13:10.097 get multiplied by the roots of unity when needed, then take sum and differences etc. 00:13:10.097 --> 00:13:15.444 And as you go from left to right, you end up with the output X[0] to X[7]. 00:13:15.444 --> 00:13:21.836 What you can recognize visually are the butterflies, which are simply the sum and differences 00:13:21.836 --> 00:13:27.331 that appear again and again in the recursive computation of the DFT 00:13:27.331 --> 00:13:30.371 using the Fast Fourier Transform algorithm. 00:13:31.832 --> 00:13:33.680 Now, is this a big deal? 00:13:33.680 --> 00:13:39.036 In image processing, so, for example, on your digital camera or on your mobile phone, 00:13:39.036 --> 00:13:43.476 the image is taken in small blocks of 8 by 8. 00:13:43.476 --> 00:13:47.167 One computes a transform that is called a Discrete Cosine Transform, 00:13:47.167 --> 00:13:49.070 we shall see this later in this class 00:13:49.070 --> 00:13:51.542 when we study the JPEG compression algorithm, 00:13:51.542 --> 00:13:55.916 and the DCT is a little bit similar to a DFT, it's like a real DFT. 00:13:55.916 --> 00:14:01.060 It has a fast algorithm that is very much inspired by what we just saw. 00:14:03.443 --> 00:14:09.283 Let's compare direct multiplication: so 8 by 8 is 64 inputs; 00:14:09.283 --> 00:14:16.371 squared, that's 4000 multiplications; it's a dominant cost because it's, it's fixed point multiplications. 00:14:16.371 --> 00:14:19.744 The transform can be computed in rows and columns separately, 00:14:19.744 --> 00:14:23.241 so you have 16 DFT's, essentially. 00:14:23.241 --> 00:14:26.689 We have seen that such an algorithm would take 2 multiplications, 00:14:26.689 --> 00:14:30.485 so we would have a total of 32 multiplications. 00:14:30.485 --> 00:14:34.009 So we have a saving of the order of 2 orders of magnitude. 00:14:34.009 --> 00:14:35.528 And that's a big deal. 00:14:35.528 --> 00:14:41.385 So, instead of compressing your image in 100 seconds, it will take 1 second, for example. 00:14:42.999 --> 00:14:46.936 So far, we have only considered lengths N which were powers of 2. 00:14:46.936 --> 00:14:48.993 But of course, other lengths are also important, 00:14:48.993 --> 00:14:53.708 and the message here is that for every possible length N, there is a fast algorithm. 00:14:53.708 --> 00:14:57.151 So the first case is N is equal to a power of a prime 00:14:57.151 --> 00:15:01.824 We looked (?) at the case p is equal to 2, and we will derive the Cooley-Tukey algorithm. 00:15:01.824 --> 00:15:04.922 Very famous, very simple and very important. 00:15:04.922 --> 00:15:08.310 When N is a product of 2 coprime numbers, 00:15:08.310 --> 00:15:14.138 then there is a way to reorder this into a two-dimensional DFT of size N1 by N2, 00:15:14.138 --> 00:15:18.428 this is known as the Good-Thomas algorithm, it was invented in the 50s 00:15:18.428 --> 00:15:20.835 and reinvented in the 60s again. 00:15:20.835 --> 00:15:25.244 When N is a prime, we can do a mapping of the Transform 00:15:25.244 --> 00:15:28.397 that results in a convolution of size N-1. 00:15:28.397 --> 00:15:29.845 This might not be good news, 00:15:29.845 --> 00:15:35.924 except that we'll see that the convolution of size N-1 can also be solved with a DFT, 00:15:35.924 --> 00:15:40.484 and so if N is, for example, 17, then N-1 is 16, 00:15:40.484 --> 00:15:44.212 and this we can solve with a usual Cooley-Tukey fast algorithm. 00:15:44.212 --> 00:15:46.193 By combining all the tricks involved, 00:15:46.193 --> 00:15:50.373 minimizing the number of multiplications, reordering the terms, 00:15:50.373 --> 00:15:54.368 we have the so-called Winograd Fast Fourier Transform, 00:15:54.368 --> 00:15:59.337 which was invented by Winograd in the late 1970's, 00:15:59.337 --> 00:16:04.449 and which really caps the work on fast algorithm development from a theoretical point of view. 00:16:05.786 --> 00:16:10.966 Let's us look at this case N=5 that we had seen before: the matrix is familiar. 00:16:10.966 --> 00:16:16.684 We take out one piece of it and we remove the columns of 1 and the row of 1, 00:16:16.684 --> 00:16:24.236 and we are left with a matrix that has, of course, all powers of W from 1 to 5, 00:16:24.236 --> 00:16:26.754 but each power appears only once 00:16:26.754 --> 00:16:34.130 -- that's the important thing here -- and it can be shown that if you reorder rows and columns properly, 00:16:34.130 --> 00:16:40.037 you end up with this matrix that we see on the right side which is actually a circulant matrix. 00:16:40.037 --> 00:16:43.153 So you can see that the 2nd row is simply the 1st row, 00:16:43.153 --> 00:16:46.738 circularly shifted, and so on, to the bottom. 00:16:46.738 --> 00:16:52.400 And such a circulant matrix, we will see, can be diagonalized by a DFT of size 4, 00:16:52.400 --> 00:16:56.337 so we are back to computing a DFT but now of a size 1 less. 00:16:58.121 --> 00:17:01.033 When N is a product of 2 coprime numbers. 00:17:01.033 --> 00:17:02.719 Here, we take N=6. 00:17:02.719 --> 00:17:07.635 It can be written as a 2-dimensional DFT of size 3 by 2. 00:17:07.635 --> 00:17:14.598 And such a DFT can be computed by having 3 DFTs of size 2, and 2 DFTs of size 3. 00:17:14.598 --> 00:17:16.789 We write this here in a factorized form, 00:17:16.789 --> 00:17:23.818 so we have permutation matrices R, Q and P and we have block diagonal matrices. 00:17:23.818 --> 00:17:28.914 The first one on the left is block diagonal with DF-- three DFTs of size 2. 00:17:28.914 --> 00:17:33.735 The block diagonal matrix on the right has two DFTs of size three. 00:17:33.735 --> 00:17:39.139 Except for permutations, there was no cost in transforming a DFT of size 6 00:17:39.139 --> 00:17:44.766 into 3 DFTs of size 2, and 2 DFTs of size 3, so this is very efficient. 00:17:46.658 --> 00:17:49.010 The conclusion is don't worry be happy: 00:17:49.010 --> 00:17:53.849 any length of DFT can be mapped into a Fast Fourier Transform algorithm. 00:17:53.849 --> 00:17:58.175 Therefore this is an important message because you don't have to take your signal, 00:17:58.175 --> 00:18:02.568 zero-pad so as to get a length that is a power of 2, for example. 00:18:02.568 --> 00:18:08.316 This is often done but it's a mistake, if you really want to compute a DFT of size 17, 00:18:08.316 --> 00:18:11.957 then there is an algorithm that will compute it and it will compute it fast. 00:18:11.957 --> 00:18:15.656 I also want to say that there are extremely good packages out there, 00:18:15.656 --> 00:18:22.018 something called, for example, the Fastest Fourier Transform in the West, or the SPIRAL program. 00:18:22.018 --> 00:18:26.072 And these packages allow you to not worry, 00:18:26.072 --> 00:18:30.139 and simply find the best possible Fourier Transform for any length. 00:18:30.139 --> 00:18:33.705 And as we had seen before, this makes a big difference. 00:18:33.705 --> 00:18:37.275 We told-- we are talking about orders of magnitude in speed up. 00:18:38.936 --> 00:18:41.932 Let me conclude this module on the Fourier Transform. 00:18:41.932 --> 00:18:44.274 The Fourier transform is a magical tool. 00:18:44.274 --> 00:18:48.511 It's a beautiful mathematical tool, it is also a very practical tool. 00:18:48.511 --> 00:18:52.047 We have seen the DTFD, which is good for math. 00:18:52.047 --> 00:18:55.234 We have seen the DFT, which is good for computation. 00:18:55.234 --> 00:18:57.302 And we have seen the Fast Fourier Transform, 00:18:57.302 --> 00:19:00.590 which allows to compute Fourier transforms very rapidly. 00:19:00.590 --> 00:19:06.517 This fascination with Fourier can be seen here in a tribute on a license plate.