Return to Video

4.9b - FFT: history and algorithms

  • 0:01 - 0:06
    In this second half of module 4.9, we are going to look specifically
  • 0:06 - 0:12
    at one example of a Divide and Conquer algorithm, it's the Cooley-Turkey Fast Fourier Transform,
  • 0:12 - 0:16
    we will see a matrix view of this algorithm, as well as flow-graphs,
  • 0:16 - 0:21
    and then conclude by looking at more general cases of Fast Fourier Transforms.
  • 0:22 - 0:26
    So far, we have only had conceptual discussions
  • 0:26 - 0:29
    of what the Divide and Conquer algorithm would be.
  • 0:29 - 0:35
    We now need to be concrete and actually develop an algorithm for the Discrete Fourier Transform,
  • 0:35 - 0:37
    based on the idea of Divide and Conquer.
  • 0:37 - 0:42
    We'll do this by analyzing an algorithm called the decimation-in-time algorithm.
  • 0:42 - 0:46
    It'll take us a number of slides until we get to all the details of it.
  • 0:46 - 0:47
    But this is very important,
  • 0:47 - 0:54
    because you're going to understand the mechanics of one of the most famous algorithms of all times.
  • 0:54 - 1:00
    Start with the definition of the DFT, so X[k] is the sum from 0 to N-1
  • 1:00 - 1:04
    of (?) x[n], multiplied by the root of unity to the power nk.
  • 1:04 - 1:09
    k goes from 0 to N -1, W is the usual Nth root of unity.
  • 1:09 - 1:15
    Now, as very often, a good guess is half of the answer to a fast algorithm.
  • 1:15 - 1:21
    The good guess is the following: break the input into even and odd index terms,
  • 1:21 - 1:23
    that's why it's called decimation-in-time,
  • 1:23 - 1:29
    because it's like taking the sequence and throwing out half of the entries,
  • 1:29 - 1:31
    that's decimation by a factor of 2.
  • 1:31 - 1:39
    And so, x[n] is mapped into x[2n], the even index samples, and x[2n] + 1, the odd index samples,
  • 1:39 - 1:43
    and now n goes from 0 to N/2 - 1.
  • 1:43 - 1:48
    Recall that N is assumed to be a power of 2, so N/2 is again an integer.
  • 1:48 - 1:51
    Then, we need to do something similar to the output.
  • 1:51 - 1:55
    But there it's not even and odd, it's first half and second half.
  • 1:55 - 2:01
    So, X[k], k going from 0 to N minus 1, becomes the first half.
  • 2:01 - 2:09
    X[k] plus the second half X[k+N/2], where now k runs from 0 to N/2 -1.
  • 2:11 - 2:14
    Let us now look at the equations for the DFT,
  • 2:14 - 2:17
    and use this separation of the inputs into even and odd inputs.
  • 2:17 - 2:24
    So X[k] now is the sum of 2 sums of length N/2,
  • 2:24 - 2:30
    over the even and odd ones, with respective powers of W.
  • 2:30 - 2:34
    And we separate these two sums, we simplify the exponent,
  • 2:34 - 2:37
    and then we need to use a little trick here.
  • 2:37 - 2:39
    The trick is the following:
  • 2:39 - 2:43
    If you look at Nth root of unity, you take its power of 2,
  • 2:43 - 2:53
    then this is equal to e to the -j4π/N, which is equal to e to the -j2π divided by N/2.
  • 2:53 - 2:57
    And this of course is the root of unity of order N/2.
  • 2:57 - 3:02
    Using this, we can write W to the power 2nk
  • 3:02 - 3:08
    as W of root N/2, to the power nk.
  • 3:08 - 3:19
    We use this and we see now that we have 2 sums of lengths N/2 with respect to the N/2th root of unity
  • 3:19 - 3:25
    and we write this as the sum of X' k and X'' k,
  • 3:25 - 3:29
    where X'' k is actually multiplied by W to the k.
  • 3:29 - 3:32
    Here W is still the nth (?) root of unity.
  • 3:34 - 3:38
    We can now say this explicitly in the following way:
  • 3:38 - 3:43
    we have 2 half-size DFTs, which we call X', X''.
  • 3:43 - 3:47
    We have to multiply the second DFT by W k.
  • 3:47 - 3:53
    That will require N/2 complex multiplications and we add the results together.
  • 3:53 - 3:57
    With this, we have found the first half of our result, namely X[k],
  • 3:57 - 4:02
    for k going from zero to N/2 -1.
  • 4:02 - 4:07
    Now we need to find out how to compute the second half of the DFT, and then we'll be done.
  • 4:08 - 4:12
    We have our first half of the DFT: let us consider the second half,
  • 4:12 - 4:16
    So that's X[k+N/2].
  • 4:16 - 4:22
    This is made up of two sums over the even and the odd inputs.
  • 4:22 - 4:29
    And the sum of the odd inputs is pre-multiplied by W to the power k + N/2.
  • 4:29 - 4:36
    We need again a little trick and this is to simplify W to the power N/2,
  • 4:36 - 4:46
    this is equal to e to the - j Nπ/N, this is e to the -jπ, which is equal to -1.
  • 4:47 - 4:49
    With this, we see that
  • 4:49 - 4:56
    the second sum is simply premultiplied by W to the k and a change of sign,
  • 4:56 - 5:06
    so we get as a net result that the second half of the DFT is actually equal to X' - Wk X''
  • 5:06 - 5:09
    So it's the same two parts we had before.
  • 5:09 - 5:12
    We simply -- subtraction inside of a sum.
  • 5:15 - 5:18
    In words: we have now computed both parts.
  • 5:18 - 5:21
    So we divide the input into even and odd samples,
  • 5:21 - 5:25
    compute 2 DFTs of size N/2.
  • 5:25 - 5:33
    We multiply this output of the second DFT by the roots of unity of order n (?) raised to the kth power.
  • 5:33 - 5:36
    That uses capital N/2 multiplications.
  • 5:36 - 5:43
    Then we recover the outputs by taking the sum and the difference of these 2 parts,
  • 5:43 - 5:48
    and that will give us X[k] and X[k+N/2].
  • 5:49 - 5:56
    It is easiest to see this in a diagram, and we do the diagram for a DFT of size 8.
  • 5:56 - 5:58
    So we have x[0] to x[7].
  • 5:58 - 6:04
    They are divided into the even parts, the upper half, the odd part, the lower half.
  • 6:04 - 6:07
    And we compute 2 DFTs of size N/2, in this case,
  • 6:07 - 6:10
    that would be DFT's of size 4.
  • 6:10 - 6:17
    Then we simply recombine the respective outputs by sum and differences,
  • 6:17 - 6:23
    and we get X[0], which is a sum of the first output of both DFTs.
  • 6:23 - 6:31
    The difference which will give you the X[4] output, which is the difference of these first outputs, etc.
  • 6:31 - 6:34
    And with this, we have first half, second half.
  • 6:34 - 6:40
    In this very simple diagram, we see very explicitly this Divide and Conquer algorithm at work.
  • 6:43 - 6:45
    So what is the complexity now?
  • 6:45 - 6:50
    So to split the DFT into two pieces of size N/2, that was free.
  • 6:50 - 6:54
    You simply had to select even and odd index inputs.
  • 6:54 - 6:56
    Then we compute two DFT's of size N/2.
  • 6:56 - 7:01
    Recall that the complexity's quadratic, so we have this N square over 2 complexity here.
  • 7:01 - 7:07
    We merge the two results with multiplications: there are N/2 of them.
  • 7:07 - 7:12
    And the total complexity, at this stage, is N/2 + N/2 (?),
  • 7:12 - 7:18
    which, indeed, is smaller than the original complexity, which is quadratic in N.
  • 7:18 - 7:21
    So, in general, we have about half the complexity of the initial problem.
  • 7:21 - 7:26
    Now, this would not be a big deal: that's a factor of 2 of savings, nice to have.
  • 7:26 - 7:29
    But we are aiming for much more than this.
  • 7:31 - 7:33
    What about repeating this process?
  • 7:33 - 7:38
    We know that, since N is a power of 2, we can divide in 2, 4, 8, etcetera.
  • 7:39 - 7:43
    And we can go until we end up with trivial DFTs of size 2.
  • 7:43 - 7:46
    These simply amount to a sum and a difference.
  • 7:46 - 7:52
    This will require log N minus 1 (?) steps, because we end up with a DFT of size 2.
  • 7:53 - 8:00
    Each step requires a merger and uses an order N/2 multiplications and order N additions,
  • 8:00 - 8:08
    so the total will be N/2 (log-in-base-2 N-1) multiplications and N log-in-base-2 N additions.
  • 8:09 - 8:14
    So the saving is of order log-in-base-2 N divided by N.
  • 8:14 - 8:19
    The key result, the "take home" message here, is that a DFT of size N
  • 8:19 - 8:23
    requires order N log-in-base-2 N operations.
  • 8:23 - 8:27
    Let us look at a few examples to see how big the savings is.
  • 8:27 - 8:32
    So when N is equal to 1024, that's 2 to the power of 10,
  • 8:32 - 8:35
    so the saving is of the order of factor 100.
  • 8:36 - 8:41
    If N is 2 to the power of 20, which is roughly a million, a little bit more,
  • 8:41 - 8:46
    then the saving is of the order of 50,000: so that's a big deal, indeed.
  • 8:48 - 8:54
    We have seen a recursive formulation using sums on even and odd indexed inputs.
  • 8:54 - 8:57
    We have seen a blog diagram (?) view of the DFT.
  • 8:57 - 9:01
    Let's look now at a matrix factorization view.
  • 9:01 - 9:05
    This is important because many implementations, for example in Matlab or FreeMat
  • 9:05 - 9:08
    will actually use numerical linear algebra,
  • 9:08 - 9:12
    and then the matrix factorization is actually important.
  • 9:12 - 9:19
    So in this case we look at W4, the size 4 DFT, so recall you have to separate even and odd samples,
  • 9:19 - 9:21
    that's the 1st matrix on the right.
  • 9:21 - 9:24
    Then we compute two DFT's of size 2.
  • 9:24 - 9:28
    These are sum and differences, it's a block diagonal matrix.
  • 9:28 - 9:32
    Then we have to multiply the odd outputs by W k:
  • 9:33 - 9:39
    here, these are simply j's and minus j, and sum and difference, which is included in here,
  • 9:39 - 9:44
    and we get the overall result, with these three small matrices.
  • 9:44 - 9:48
    This uses eight additions, no multiplications.
  • 9:50 - 9:52
    Let's now move to the next bigger size.
  • 9:52 - 9:54
    This is N=8.
  • 9:54 - 9:58
    Now this is going to be big because each matrix is of size 8 by 8,
  • 9:58 - 10:01
    and we have a factorization to take care of,
  • 10:01 - 10:06
    so the first matrix W8 in its initial form we have seen before.
  • 10:08 - 10:12
    The first step is to separate even from odd indexed samples.
  • 10:12 - 10:16
    Call this D 8 for decimation of size 8.
  • 10:16 - 10:20
    You see the matrix here, you can sort of recognize it's structure,
  • 10:20 - 10:26
    so the 1st half takes out the even inputs, the 2nd half takes out the odd inputs.
  • 10:26 - 10:31
    No arithmetic operations so far, just re-indexing of inputs.
  • 10:33 - 10:39
    The step 2 is to compute two DFTs of size N/2, on these even and odd index samples.
  • 10:39 - 10:46
    So, each submatrix is a DFT of size 4 and the result is a block diagonal matrix.
  • 10:46 - 10:52
    On the diagonal, you have the W4 matrices, off the diagonal, you have the 0 matrix.
  • 10:52 - 10:56
    This requires 2 DFTs of size 4, according to what we have just seen,
  • 10:56 - 10:58
    that's a total of 16 additions.
  • 11:00 - 11:06
    Step 3 is to multiply the output of the 2nd DFT by W to the k.
  • 11:06 - 11:12
    This will be a diagonal matrix with an identity matrix in the 1st half of the diagonal
  • 11:12 - 11:16
    and the 2nd half is made up of the successive powers of W.
  • 11:16 - 11:23
    This will require 2 multiplications because W square is actually -j, and that's free.
  • 11:23 - 11:26
    So we can summarize this as a block matrix
  • 11:26 - 11:33
    with I4, Λ4 on the diagonal, 04 which are the zero matrices, off the diagonal.
  • 11:33 - 11:39
    And the Λ4 is the one that contains the complex multiplications by the roots of unity
  • 11:41 - 11:44
    Step 4: we have to recombine the final output
  • 11:44 - 11:49
    to obtain X[k] and X[k+N/2] by a sum and difference.
  • 11:49 - 11:55
    So this matrix is also very structured. The first half computes the sum.
  • 11:55 - 12:00
    So we can put this in block matrix form by having two identity matrices of size 4.
  • 12:00 - 12:03
    And the second half computes the differences,
  • 12:03 - 12:09
    so that's the identity matrix of size 4 minus itself: that requires 8 additions.
  • 12:11 - 12:16
    So in total, we have the product of 4 matrices, D8, the selector matrix,
  • 12:16 - 12:20
    then the one that is diagonal with the DFTs of size 4,
  • 12:20 - 12:26
    then the one that is diagonal with the I4, Λ4, and finally the sum and differences.
  • 12:26 - 12:30
    So the total will be 24 additions and 2 multiplications.
  • 12:30 - 12:35
    Now this is much smaller than what we would expect from an N square complexity.
  • 12:35 - 12:40
    N square would be in this case 64 order, 64 additions and multiplications.
  • 12:40 - 12:43
    So you see that the big saving here is multiplications.
  • 12:43 - 12:47
    That is the case here because the DFT is very small
  • 12:47 - 12:51
    and so it quickly comes down to trivial DFT's.
  • 12:53 - 12:57
    We can write out what we just did in an explicit flow-graph.
  • 12:57 - 13:04
    Such a flow-graph simply indicates how inputs -- x[0] to x[7| -- go through the flow-graph,
  • 13:04 - 13:10
    get multiplied by the roots of unity when needed, then take sum and differences etc.
  • 13:10 - 13:15
    And as you go from left to right, you end up with the output X[0] to X[7].
  • 13:15 - 13:22
    What you can recognize visually are the butterflies, which are simply the sum and differences
  • 13:22 - 13:27
    that appear again and again in the recursive computation of the DFT
  • 13:27 - 13:30
    using the Fast Fourier Transform algorithm.
  • 13:32 - 13:34
    Now, is this a big deal?
  • 13:34 - 13:39
    In image processing, so, for example, on your digital camera or on your mobile phone,
  • 13:39 - 13:43
    the image is taken in small blocks of 8 by 8.
  • 13:43 - 13:47
    One computes a transform that is called a Discrete Cosine Transform,
  • 13:47 - 13:49
    we shall see this later in this class
  • 13:49 - 13:52
    when we study the JPEG compression algorithm,
  • 13:52 - 13:56
    and the DCT is a little bit similar to a DFT, it's like a real DFT.
  • 13:56 - 14:01
    It has a fast algorithm that is very much inspired by what we just saw.
  • 14:03 - 14:09
    Let's compare direct multiplication: so 8 by 8 is 64 inputs;
  • 14:09 - 14:16
    squared, that's 4000 multiplications; it's a dominant cost because it's, it's fixed point multiplications.
  • 14:16 - 14:20
    The transform can be computed in rows and columns separately,
  • 14:20 - 14:23
    so you have 16 DFT's, essentially.
  • 14:23 - 14:27
    We have seen that such an algorithm would take 2 multiplications,
  • 14:27 - 14:30
    so we would have a total of 32 multiplications.
  • 14:30 - 14:34
    So we have a saving of the order of 2 orders of magnitude.
  • 14:34 - 14:36
    And that's a big deal.
  • 14:36 - 14:41
    So, instead of compressing your image in 100 seconds, it will take 1 second, for example.
  • 14:43 - 14:47
    So far, we have only considered lengths N which were powers of 2.
  • 14:47 - 14:49
    But of course, other lengths are also important,
  • 14:49 - 14:54
    and the message here is that for every possible length N, there is a fast algorithm.
  • 14:54 - 14:57
    So the first case is N is equal to a power of a prime
  • 14:57 - 15:02
    We looked (?) at the case p is equal to 2, and we will derive the Cooley-Tukey algorithm.
  • 15:02 - 15:05
    Very famous, very simple and very important.
  • 15:05 - 15:08
    When N is a product of 2 coprime numbers,
  • 15:08 - 15:14
    then there is a way to reorder this into a two-dimensional DFT of size N1 by N2,
  • 15:14 - 15:18
    this is known as the Good-Thomas algorithm, it was invented in the 50s
  • 15:18 - 15:21
    and reinvented in the 60s again.
  • 15:21 - 15:25
    When N is a prime, we can do a mapping of the Transform
  • 15:25 - 15:28
    that results in a convolution of size N-1.
  • 15:28 - 15:30
    This might not be good news,
  • 15:30 - 15:36
    except that we'll see that the convolution of size N-1 can also be solved with a DFT,
  • 15:36 - 15:40
    and so if N is, for example, 17, then N-1 is 16,
  • 15:40 - 15:44
    and this we can solve with a usual Cooley-Tukey fast algorithm.
  • 15:44 - 15:46
    By combining all the tricks involved,
  • 15:46 - 15:50
    minimizing the number of multiplications, reordering the terms,
  • 15:50 - 15:54
    we have the so-called Winograd Fast Fourier Transform,
  • 15:54 - 15:59
    which was invented by Winograd in the late 1970's,
  • 15:59 - 16:04
    and which really caps the work on fast algorithm development from a theoretical point of view.
  • 16:06 - 16:11
    Let's us look at this case N=5 that we had seen before: the matrix is familiar.
  • 16:11 - 16:17
    We take out one piece of it and we remove the columns of 1 and the row of 1,
  • 16:17 - 16:24
    and we are left with a matrix that has, of course, all powers of W from 1 to 5,
  • 16:24 - 16:27
    but each power appears only once
  • 16:27 - 16:34
    -- that's the important thing here -- and it can be shown that if you reorder rows and columns properly,
  • 16:34 - 16:40
    you end up with this matrix that we see on the right side which is actually a circulant matrix.
  • 16:40 - 16:43
    So you can see that the 2nd row is simply the 1st row,
  • 16:43 - 16:47
    circularly shifted, and so on, to the bottom.
  • 16:47 - 16:52
    And such a circulant matrix, we will see, can be diagonalized by a DFT of size 4,
  • 16:52 - 16:56
    so we are back to computing a DFT but now of a size 1 less.
  • 16:58 - 17:01
    When N is a product of 2 coprime numbers.
  • 17:01 - 17:03
    Here, we take N=6.
  • 17:03 - 17:08
    It can be written as a 2-dimensional DFT of size 3 by 2.
  • 17:08 - 17:15
    And such a DFT can be computed by having 3 DFTs of size 2, and 2 DFTs of size 3.
  • 17:15 - 17:17
    We write this here in a factorized form,
  • 17:17 - 17:24
    so we have permutation matrices R, Q and P and we have block diagonal matrices.
  • 17:24 - 17:29
    The first one on the left is block diagonal with DF-- three DFTs of size 2.
  • 17:29 - 17:34
    The block diagonal matrix on the right has two DFTs of size three.
  • 17:34 - 17:39
    Except for permutations, there was no cost in transforming a DFT of size 6
  • 17:39 - 17:45
    into 3 DFTs of size 2, and 2 DFTs of size 3, so this is very efficient.
  • 17:47 - 17:49
    The conclusion is don't worry be happy:
  • 17:49 - 17:54
    any length of DFT can be mapped into a Fast Fourier Transform algorithm.
  • 17:54 - 17:58
    Therefore this is an important message because you don't have to take your signal,
  • 17:58 - 18:03
    zero-pad so as to get a length that is a power of 2, for example.
  • 18:03 - 18:08
    This is often done but it's a mistake, if you really want to compute a DFT of size 17,
  • 18:08 - 18:12
    then there is an algorithm that will compute it and it will compute it fast.
  • 18:12 - 18:16
    I also want to say that there are extremely good packages out there,
  • 18:16 - 18:22
    something called, for example, the Fastest Fourier Transform in the West, or the SPIRAL program.
  • 18:22 - 18:26
    And these packages allow you to not worry,
  • 18:26 - 18:30
    and simply find the best possible Fourier Transform for any length.
  • 18:30 - 18:34
    And as we had seen before, this makes a big difference.
  • 18:34 - 18:37
    We told-- we are talking about orders of magnitude in speed up.
  • 18:39 - 18:42
    Let me conclude this module on the Fourier Transform.
  • 18:42 - 18:44
    The Fourier transform is a magical tool.
  • 18:44 - 18:49
    It's a beautiful mathematical tool, it is also a very practical tool.
  • 18:49 - 18:52
    We have seen the DTFD, which is good for math.
  • 18:52 - 18:55
    We have seen the DFT, which is good for computation.
  • 18:55 - 18:57
    And we have seen the Fast Fourier Transform,
  • 18:57 - 19:01
    which allows to compute Fourier transforms very rapidly.
  • 19:01 - 19:07
    This fascination with Fourier can be seen here in a tribute on a license plate.
Title:
4.9b - FFT: history and algorithms
Description:

See "Chapter 4 Fourier Analysis" in "Signal Processing for Communications" by Paolo Prandoni and Martin Vetterli, available from http://www.sp4comm.org/ , and the slides for the entire module 4 of the Digital Signal Processing Coursera course, in https://spark-public.s3.amazonaws.com/dsp/slides/module4-0.pdf .

And if you are registered for this Coursera course, see https://class.coursera.org/dsp-001/wiki/view?page=week3

more » « less
Video Language:
English

English subtitles

Incomplete

Revisions