4.9b - FFT: history and algorithms

0:01 - 0:06

In this second half of module 4.9, we are going to look specifically
0:06 - 0:12

at one example of a Divide and Conquer algorithm, it's the Cooley-Turkey Fast Fourier Transform,
0:12 - 0:16

we will see a matrix view of this algorithm, as well as flow-graphs,
0:16 - 0:21

and then conclude by looking at more general cases of Fast Fourier Transforms.
0:22 - 0:26

So far, we have only had conceptual discussions
0:26 - 0:29

of what the Divide and Conquer algorithm would be.
0:29 - 0:35

We now need to be concrete and actually develop an algorithm for the Discrete Fourier Transform,
0:35 - 0:37

based on the idea of Divide and Conquer.
0:37 - 0:42

We'll do this by analyzing an algorithm called the decimation-in-time algorithm.
0:42 - 0:46

It'll take us a number of slides until we get to all the details of it.
0:46 - 0:47

But this is very important,
0:47 - 0:54

because you're going to understand the mechanics of one of the most famous algorithms of all times.
0:54 - 1:00

Start with the definition of the DFT, so X[k] is the sum from 0 to N-1
1:00 - 1:04

of (?) x[n], multiplied by the root of unity to the power nk.
1:04 - 1:09

k goes from 0 to N -1, W is the usual Nth root of unity.
1:09 - 1:15

Now, as very often, a good guess is half of the answer to a fast algorithm.
1:15 - 1:21

The good guess is the following: break the input into even and odd index terms,
1:21 - 1:23

that's why it's called decimation-in-time,
1:23 - 1:29

because it's like taking the sequence and throwing out half of the entries,
1:29 - 1:31

that's decimation by a factor of 2.
1:31 - 1:39

And so, x[n] is mapped into x[2n], the even index samples, and x[2n] + 1, the odd index samples,
1:39 - 1:43

and now n goes from 0 to N/2 - 1.
1:43 - 1:48

Recall that N is assumed to be a power of 2, so N/2 is again an integer.
1:48 - 1:51

Then, we need to do something similar to the output.
1:51 - 1:55

But there it's not even and odd, it's first half and second half.
1:55 - 2:01

So, X[k], k going from 0 to N minus 1, becomes the first half.
2:01 - 2:09

X[k] plus the second half X[k+N/2], where now k runs from 0 to N/2 -1.
2:11 - 2:14

Let us now look at the equations for the DFT,
2:14 - 2:17

and use this separation of the inputs into even and odd inputs.
2:17 - 2:24

So X[k] now is the sum of 2 sums of length N/2,
2:24 - 2:30

over the even and odd ones, with respective powers of W.
2:30 - 2:34

And we separate these two sums, we simplify the exponent,
2:34 - 2:37

and then we need to use a little trick here.
2:37 - 2:39

The trick is the following:
2:39 - 2:43

If you look at Nth root of unity, you take its power of 2,
2:43 - 2:53

then this is equal to e to the -j4π/N, which is equal to e to the -j2π divided by N/2.
2:53 - 2:57

And this of course is the root of unity of order N/2.
2:57 - 3:02

Using this, we can write W to the power 2nk
3:02 - 3:08

as W of root N/2, to the power nk.
3:08 - 3:19

We use this and we see now that we have 2 sums of lengths N/2 with respect to the N/2th root of unity
3:19 - 3:25

and we write this as the sum of X' k and X'' k,
3:25 - 3:29

where X'' k is actually multiplied by W to the k.
3:29 - 3:32

Here W is still the nth (?) root of unity.
3:34 - 3:38

We can now say this explicitly in the following way:
3:38 - 3:43

we have 2 half-size DFTs, which we call X', X''.
3:43 - 3:47

We have to multiply the second DFT by W k.
3:47 - 3:53

That will require N/2 complex multiplications and we add the results together.
3:53 - 3:57

With this, we have found the first half of our result, namely X[k],
3:57 - 4:02

for k going from zero to N/2 -1.
4:02 - 4:07

Now we need to find out how to compute the second half of the DFT, and then we'll be done.
4:08 - 4:12

We have our first half of the DFT: let us consider the second half,
4:12 - 4:16

So that's X[k+N/2].
4:16 - 4:22

This is made up of two sums over the even and the odd inputs.
4:22 - 4:29

And the sum of the odd inputs is pre-multiplied by W to the power k + N/2.
4:29 - 4:36

We need again a little trick and this is to simplify W to the power N/2,
4:36 - 4:46

this is equal to e to the - j Nπ/N, this is e to the -jπ, which is equal to -1.
4:47 - 4:49

With this, we see that
4:49 - 4:56

the second sum is simply premultiplied by W to the k and a change of sign,
4:56 - 5:06

so we get as a net result that the second half of the DFT is actually equal to X' - Wk X''
5:06 - 5:09

So it's the same two parts we had before.
5:09 - 5:12

We simply -- subtraction inside of a sum.
5:15 - 5:18

In words: we have now computed both parts.
5:18 - 5:21

So we divide the input into even and odd samples,
5:21 - 5:25

compute 2 DFTs of size N/2.
5:25 - 5:33

We multiply this output of the second DFT by the roots of unity of order n (?) raised to the kth power.
5:33 - 5:36

That uses capital N/2 multiplications.
5:36 - 5:43

Then we recover the outputs by taking the sum and the difference of these 2 parts,
5:43 - 5:48

and that will give us X[k] and X[k+N/2].
5:49 - 5:56

It is easiest to see this in a diagram, and we do the diagram for a DFT of size 8.
5:56 - 5:58

So we have x[0] to x[7].
5:58 - 6:04

They are divided into the even parts, the upper half, the odd part, the lower half.
6:04 - 6:07

And we compute 2 DFTs of size N/2, in this case,
6:07 - 6:10

that would be DFT's of size 4.
6:10 - 6:17

Then we simply recombine the respective outputs by sum and differences,
6:17 - 6:23

and we get X[0], which is a sum of the first output of both DFTs.
6:23 - 6:31

The difference which will give you the X[4] output, which is the difference of these first outputs, etc.
6:31 - 6:34

And with this, we have first half, second half.
6:34 - 6:40

In this very simple diagram, we see very explicitly this Divide and Conquer algorithm at work.
6:43 - 6:45

So what is the complexity now?
6:45 - 6:50

So to split the DFT into two pieces of size N/2, that was free.
6:50 - 6:54

You simply had to select even and odd index inputs.
6:54 - 6:56

Then we compute two DFT's of size N/2.
6:56 - 7:01

Recall that the complexity's quadratic, so we have this N square over 2 complexity here.
7:01 - 7:07

We merge the two results with multiplications: there are N/2 of them.
7:07 - 7:12

And the total complexity, at this stage, is N/2 + N/2 (?),
7:12 - 7:18

which, indeed, is smaller than the original complexity, which is quadratic in N.
7:18 - 7:21

So, in general, we have about half the complexity of the initial problem.
7:21 - 7:26

Now, this would not be a big deal: that's a factor of 2 of savings, nice to have.
7:26 - 7:29

But we are aiming for much more than this.
7:31 - 7:33

What about repeating this process?
7:33 - 7:38

We know that, since N is a power of 2, we can divide in 2, 4, 8, etcetera.
7:39 - 7:43

And we can go until we end up with trivial DFTs of size 2.
7:43 - 7:46

These simply amount to a sum and a difference.
7:46 - 7:52

This will require log N minus 1 (?) steps, because we end up with a DFT of size 2.
7:53 - 8:00

Each step requires a merger and uses an order N/2 multiplications and order N additions,
8:00 - 8:08

so the total will be N/2 (log-in-base-2 N-1) multiplications and N log-in-base-2 N additions.
8:09 - 8:14

So the saving is of order log-in-base-2 N divided by N.
8:14 - 8:19

The key result, the "take home" message here, is that a DFT of size N
8:19 - 8:23

requires order N log-in-base-2 N operations.
8:23 - 8:27

Let us look at a few examples to see how big the savings is.
8:27 - 8:32

So when N is equal to 1024, that's 2 to the power of 10,
8:32 - 8:35

so the saving is of the order of factor 100.
8:36 - 8:41

If N is 2 to the power of 20, which is roughly a million, a little bit more,
8:41 - 8:46

then the saving is of the order of 50,000: so that's a big deal, indeed.
8:48 - 8:54

We have seen a recursive formulation using sums on even and odd indexed inputs.
8:54 - 8:57

We have seen a blog diagram (?) view of the DFT.
8:57 - 9:01

Let's look now at a matrix factorization view.
9:01 - 9:05

This is important because many implementations, for example in Matlab or FreeMat
9:05 - 9:08

will actually use numerical linear algebra,
9:08 - 9:12

and then the matrix factorization is actually important.
9:12 - 9:19

So in this case we look at W4, the size 4 DFT, so recall you have to separate even and odd samples,
9:19 - 9:21

that's the 1st matrix on the right.
9:21 - 9:24

Then we compute two DFT's of size 2.
9:24 - 9:28

These are sum and differences, it's a block diagonal matrix.
9:28 - 9:32

Then we have to multiply the odd outputs by W k:
9:33 - 9:39

here, these are simply j's and minus j, and sum and difference, which is included in here,
9:39 - 9:44

and we get the overall result, with these three small matrices.
9:44 - 9:48

This uses eight additions, no multiplications.
9:50 - 9:52

Let's now move to the next bigger size.
9:52 - 9:54

This is N=8.
9:54 - 9:58

Now this is going to be big because each matrix is of size 8 by 8,
9:58 - 10:01

and we have a factorization to take care of,
10:01 - 10:06

so the first matrix W8 in its initial form we have seen before.
10:08 - 10:12

The first step is to separate even from odd indexed samples.
10:12 - 10:16

Call this D 8 for decimation of size 8.
10:16 - 10:20

You see the matrix here, you can sort of recognize it's structure,
10:20 - 10:26

so the 1st half takes out the even inputs, the 2nd half takes out the odd inputs.
10:26 - 10:31

No arithmetic operations so far, just re-indexing of inputs.
10:33 - 10:39

The step 2 is to compute two DFTs of size N/2, on these even and odd index samples.
10:39 - 10:46

So, each submatrix is a DFT of size 4 and the result is a block diagonal matrix.
10:46 - 10:52

On the diagonal, you have the W4 matrices, off the diagonal, you have the 0 matrix.
10:52 - 10:56

This requires 2 DFTs of size 4, according to what we have just seen,
10:56 - 10:58

that's a total of 16 additions.
11:00 - 11:06

Step 3 is to multiply the output of the 2nd DFT by W to the k.
11:06 - 11:12

This will be a diagonal matrix with an identity matrix in the 1st half of the diagonal
11:12 - 11:16

and the 2nd half is made up of the successive powers of W.
11:16 - 11:23

This will require 2 multiplications because W square is actually -j, and that's free.
11:23 - 11:26

So we can summarize this as a block matrix
11:26 - 11:33

with I4, Λ4 on the diagonal, 04 which are the zero matrices, off the diagonal.
11:33 - 11:39

And the Λ4 is the one that contains the complex multiplications by the roots of unity
11:41 - 11:44

Step 4: we have to recombine the final output
11:44 - 11:49

to obtain X[k] and X[k+N/2] by a sum and difference.
11:49 - 11:55

So this matrix is also very structured. The first half computes the sum.
11:55 - 12:00

So we can put this in block matrix form by having two identity matrices of size 4.
12:00 - 12:03

And the second half computes the differences,
12:03 - 12:09

so that's the identity matrix of size 4 minus itself: that requires 8 additions.
12:11 - 12:16

So in total, we have the product of 4 matrices, D8, the selector matrix,
12:16 - 12:20

then the one that is diagonal with the DFTs of size 4,
12:20 - 12:26

then the one that is diagonal with the I4, Λ4, and finally the sum and differences.
12:26 - 12:30

So the total will be 24 additions and 2 multiplications.
12:30 - 12:35

Now this is much smaller than what we would expect from an N square complexity.
12:35 - 12:40

N square would be in this case 64 order, 64 additions and multiplications.
12:40 - 12:43

So you see that the big saving here is multiplications.
12:43 - 12:47

That is the case here because the DFT is very small
12:47 - 12:51

and so it quickly comes down to trivial DFT's.
12:53 - 12:57

We can write out what we just did in an explicit flow-graph.
12:57 - 13:04

Such a flow-graph simply indicates how inputs -- x[0] to x[7| -- go through the flow-graph,
13:04 - 13:10

get multiplied by the roots of unity when needed, then take sum and differences etc.
13:10 - 13:15

And as you go from left to right, you end up with the output X[0] to X[7].
13:15 - 13:22

What you can recognize visually are the butterflies, which are simply the sum and differences
13:22 - 13:27

that appear again and again in the recursive computation of the DFT
13:27 - 13:30

using the Fast Fourier Transform algorithm.
13:32 - 13:34

Now, is this a big deal?
13:34 - 13:39

In image processing, so, for example, on your digital camera or on your mobile phone,
13:39 - 13:43

the image is taken in small blocks of 8 by 8.
13:43 - 13:47

One computes a transform that is called a Discrete Cosine Transform,
13:47 - 13:49

we shall see this later in this class
13:49 - 13:52

when we study the JPEG compression algorithm,
13:52 - 13:56

and the DCT is a little bit similar to a DFT, it's like a real DFT.
13:56 - 14:01

It has a fast algorithm that is very much inspired by what we just saw.
14:03 - 14:09

Let's compare direct multiplication: so 8 by 8 is 64 inputs;
14:09 - 14:16

squared, that's 4000 multiplications; it's a dominant cost because it's, it's fixed point multiplications.
14:16 - 14:20

The transform can be computed in rows and columns separately,
14:20 - 14:23

so you have 16 DFT's, essentially.
14:23 - 14:27

We have seen that such an algorithm would take 2 multiplications,
14:27 - 14:30

so we would have a total of 32 multiplications.
14:30 - 14:34

So we have a saving of the order of 2 orders of magnitude.
14:34 - 14:36

And that's a big deal.
14:36 - 14:41

So, instead of compressing your image in 100 seconds, it will take 1 second, for example.
14:43 - 14:47

So far, we have only considered lengths N which were powers of 2.
14:47 - 14:49

But of course, other lengths are also important,
14:49 - 14:54

and the message here is that for every possible length N, there is a fast algorithm.
14:54 - 14:57

So the first case is N is equal to a power of a prime
14:57 - 15:02

We looked (?) at the case p is equal to 2, and we will derive the Cooley-Tukey algorithm.
15:02 - 15:05

Very famous, very simple and very important.
15:05 - 15:08

When N is a product of 2 coprime numbers,
15:08 - 15:14

then there is a way to reorder this into a two-dimensional DFT of size N1 by N2,
15:14 - 15:18

this is known as the Good-Thomas algorithm, it was invented in the 50s
15:18 - 15:21

and reinvented in the 60s again.
15:21 - 15:25

When N is a prime, we can do a mapping of the Transform
15:25 - 15:28

that results in a convolution of size N-1.
15:28 - 15:30

This might not be good news,
15:30 - 15:36

except that we'll see that the convolution of size N-1 can also be solved with a DFT,
15:36 - 15:40

and so if N is, for example, 17, then N-1 is 16,
15:40 - 15:44

and this we can solve with a usual Cooley-Tukey fast algorithm.
15:44 - 15:46

By combining all the tricks involved,
15:46 - 15:50

minimizing the number of multiplications, reordering the terms,
15:50 - 15:54

we have the so-called Winograd Fast Fourier Transform,
15:54 - 15:59

which was invented by Winograd in the late 1970's,
15:59 - 16:04

and which really caps the work on fast algorithm development from a theoretical point of view.
16:06 - 16:11

Let's us look at this case N=5 that we had seen before: the matrix is familiar.
16:11 - 16:17

We take out one piece of it and we remove the columns of 1 and the row of 1,
16:17 - 16:24

and we are left with a matrix that has, of course, all powers of W from 1 to 5,
16:24 - 16:27

but each power appears only once
16:27 - 16:34

-- that's the important thing here -- and it can be shown that if you reorder rows and columns properly,
16:34 - 16:40

you end up with this matrix that we see on the right side which is actually a circulant matrix.
16:40 - 16:43

So you can see that the 2nd row is simply the 1st row,
16:43 - 16:47

circularly shifted, and so on, to the bottom.
16:47 - 16:52

And such a circulant matrix, we will see, can be diagonalized by a DFT of size 4,
16:52 - 16:56

so we are back to computing a DFT but now of a size 1 less.
16:58 - 17:01

When N is a product of 2 coprime numbers.
17:01 - 17:03

Here, we take N=6.
17:03 - 17:08

It can be written as a 2-dimensional DFT of size 3 by 2.
17:08 - 17:15

And such a DFT can be computed by having 3 DFTs of size 2, and 2 DFTs of size 3.
17:15 - 17:17

We write this here in a factorized form,
17:17 - 17:24

so we have permutation matrices R, Q and P and we have block diagonal matrices.
17:24 - 17:29

The first one on the left is block diagonal with DF-- three DFTs of size 2.
17:29 - 17:34

The block diagonal matrix on the right has two DFTs of size three.
17:34 - 17:39

Except for permutations, there was no cost in transforming a DFT of size 6
17:39 - 17:45

into 3 DFTs of size 2, and 2 DFTs of size 3, so this is very efficient.
17:47 - 17:49

The conclusion is don't worry be happy:
17:49 - 17:54

any length of DFT can be mapped into a Fast Fourier Transform algorithm.
17:54 - 17:58

Therefore this is an important message because you don't have to take your signal,
17:58 - 18:03

zero-pad so as to get a length that is a power of 2, for example.
18:03 - 18:08

This is often done but it's a mistake, if you really want to compute a DFT of size 17,
18:08 - 18:12

then there is an algorithm that will compute it and it will compute it fast.
18:12 - 18:16

I also want to say that there are extremely good packages out there,
18:16 - 18:22

something called, for example, the Fastest Fourier Transform in the West, or the SPIRAL program.
18:22 - 18:26

And these packages allow you to not worry,
18:26 - 18:30

and simply find the best possible Fourier Transform for any length.
18:30 - 18:34

And as we had seen before, this makes a big difference.
18:34 - 18:37

We told-- we are talking about orders of magnitude in speed up.
18:39 - 18:42

Let me conclude this module on the Fourier Transform.
18:42 - 18:44

The Fourier transform is a magical tool.
18:44 - 18:49

It's a beautiful mathematical tool, it is also a very practical tool.
18:49 - 18:52

We have seen the DTFD, which is good for math.
18:52 - 18:55

We have seen the DFT, which is good for computation.
18:55 - 18:57

And we have seen the Fast Fourier Transform,
18:57 - 19:01

which allows to compute Fourier transforms very rapidly.
19:01 - 19:07

This fascination with Fourier can be seen here in a tribute on a license plate.

Title:: 4.9b - FFT: history and algorithms
Description:: See "Chapter 4 Fourier Analysis" in "Signal Processing for Communications" by Paolo Prandoni and Martin Vetterli, available from http://www.sp4comm.org/ , and the slides for the entire module 4 of the Digital Signal Processing Coursera course, in https://spark-public.s3.amazonaws.com/dsp/slides/module4-0.pdf .

And if you are registered for this Coursera course, see https://class.coursera.org/dsp-001/wiki/view?page=week3

more » « less
Video Language:: English

	Claude Almansi edited English subtitles for 4.9b - FFT: history and algorithms
	Claude Almansi edited English subtitles for 4.9b - FFT: history and algorithms
	Claude Almansi edited English subtitles for 4.9b - FFT: history and algorithms
	Claude Almansi edited English subtitles for 4.9b - FFT: history and algorithms
	Claude Almansi edited English subtitles for 4.9b - FFT: history and algorithms
	Claude Almansi edited English subtitles for 4.9b - FFT: history and algorithms
	Claude Almansi edited English subtitles for 4.9b - FFT: history and algorithms
	Claude Almansi edited English subtitles for 4.9b - FFT: history and algorithms

Show all

English subtitles

Incomplete

Revisions

Revision 6

Claude Almansi

4.9b - FFT: history and algorithms

Revisions

Our website uses cookies

Operating cookies (Required)