WEBVTT 00:00:00.160 --> 00:00:04.530 Hello, and welcome to Chapter Eight: Python Lists. 00:00:04.530 --> 00:00:08.400 So now we're sort of going to start taking care of business. 00:00:08.400 --> 00:00:10.530 We are doing, make lists and 00:00:10.530 --> 00:00:13.280 dictionaries and tuples and really start manipulating this data, 00:00:13.280 --> 00:00:16.290 and doing real data analysis, starting the, 00:00:16.290 --> 00:00:18.260 laying the proper work for real data analysis. 00:00:18.260 --> 00:00:21.950 As always, these lectures, audio, video, slides, 00:00:21.950 --> 00:00:25.740 and even book are copyright Creative Commons Attribution. 00:00:25.740 --> 00:00:31.030 So, lists, dictionaries, and tuples, the next real three big topics we're going to 00:00:31.030 --> 00:00:36.270 talk about, are collections. And we've been doing lists already, right? 00:00:37.340 --> 00:00:41.060 We've been doing lists when we were doing for loops. 00:00:41.060 --> 00:00:44.000 A list in Python is something that has a square braces. 00:00:44.000 --> 00:00:45.420 This is a constant list. 00:00:46.550 --> 00:00:48.410 Now, when I first talked to you 00:00:48.410 --> 00:00:50.530 about variables, I sort of oversimplified things. 00:00:50.530 --> 00:00:50.900 I said 00:00:50.900 --> 00:00:54.160 if you put like x equals two, and then put 00:00:54.160 --> 00:00:57.540 x equals four, the two and the four overwrite each other. 00:00:57.540 --> 00:01:01.890 A collection is where you can put a bunch of things in the same variable. 00:01:01.890 --> 00:01:04.129 Now, I have to have a way to find those things. 00:01:05.570 --> 00:01:08.820 But it allows us to put multiple things in 00:01:08.820 --> 00:01:11.810 more, more things, more than one thing in the variable. 00:01:11.810 --> 00:01:15.330 So, here we have friends, that has three strings, Joseph, Glenn, and Sally. 00:01:15.330 --> 00:01:15.970 And we have carryon 00:01:15.970 --> 00:01:20.000 that has socks, shirt, and perfume. So that's the basic idea. 00:01:20.000 --> 00:01:21.680 So what's not a collection? 00:01:21.680 --> 00:01:23.440 Well, simple variables. 00:01:23.440 --> 00:01:26.610 Simple variables are not collections, just like this example. 00:01:26.610 --> 00:01:30.190 I say x equals 2, x equals 4, and print x, 00:01:30.190 --> 00:01:33.430 and the 4's in there and the 2 is somehow gone. 00:01:33.430 --> 00:01:35.570 It was there for a moment, and then it's gone. 00:01:36.740 --> 00:01:38.470 And so that's a normal variable. 00:01:38.470 --> 00:01:41.490 They're not collections. You can't put more than one thing in it. 00:01:41.490 --> 00:01:44.220 But when you put more than one thing in it, then you 00:01:44.220 --> 00:01:46.530 have to have a way to find the things that are in there. 00:01:46.530 --> 00:01:47.320 We'll, we'll get to that. 00:01:49.260 --> 00:01:51.880 So, we've been using list constants for the last couple 00:01:51.880 --> 00:01:55.120 of chapters just because we have to use list constants. 00:01:55.120 --> 00:01:59.040 You know, so we used, in the for loop chapter, we did lists of numbers. 00:02:00.520 --> 00:02:05.000 We have done lists of strings, that's strings, red, yellow, and blue. 00:02:06.460 --> 00:02:11.230 And you don't have to necessarily, you don't necessarily 00:02:11.230 --> 00:02:13.540 have to have things all of the same type. 00:02:13.540 --> 00:02:17.680 This is a three-item list, that has a string red, 00:02:17.680 --> 00:02:22.800 the number integer 24, and 98.6, which is a floating point number. 00:02:22.800 --> 00:02:25.810 And here's an interesting thing, just as a side note. 00:02:25.810 --> 00:02:28.040 This shows that floating point numbers are 00:02:28.040 --> 00:02:32.040 not always perfectly represented inside of the computer. 00:02:32.040 --> 00:02:34.590 It's sort of an artifact of how they work. 00:02:34.590 --> 00:02:36.880 And this is an example of 98.6 is really 98 point 00:02:36.880 --> 00:02:38.980 na, na, na, na, na. 00:02:38.980 --> 00:02:41.260 So, but, don't, when you see something like that, don't freak out. 00:02:41.260 --> 00:02:43.710 Floating point numbers are the ones that show this behavior. 00:02:44.760 --> 00:02:48.340 So, interestingly, you can always, although we won't put a lot of energy into 00:02:48.340 --> 00:02:52.930 this, you can also have an element of a list be a list itself. 00:02:52.930 --> 00:02:55.630 So this a outer list that's got three elements. 00:02:55.630 --> 00:02:57.710 1, 7, and then 00:02:57.710 --> 00:02:59.860 a list that's 5 and 6. 00:02:59.860 --> 00:03:04.470 So, if you look at the length of this, there is three things in it. 00:03:04.470 --> 00:03:05.850 Not four, three. 00:03:05.850 --> 00:03:08.520 Because the outer list has 1, 2, 3 things in it. 00:03:08.520 --> 00:03:12.480 And an empty list is bracket, bracket. 00:03:12.480 --> 00:03:13.340 Okay? 00:03:13.340 --> 00:03:17.180 Like I said, we have been going through lists all along. 00:03:17.180 --> 00:03:19.660 We have iteration variables for i in. 00:03:19.660 --> 00:03:22.205 This is a list. We've been using it all along. 00:03:22.205 --> 00:03:27.270 Similarly, we've been using lists in definite loops, are a 00:03:27.270 --> 00:03:30.340 great way to go through lists, for friend in friends, there we have 00:03:30.340 --> 00:03:34.402 goes through three times, out come three lines, with the 00:03:34.402 --> 00:03:38.520 variable friend advancing through the three successive items in the list. 00:03:38.520 --> 00:03:40.380 And away we go. 00:03:40.380 --> 00:03:44.116 So, again, lists are not completely foreign to us. 00:03:44.116 --> 00:03:45.541 Now, 00:03:45.541 --> 00:03:52.520 just like in a string, we can use the index operator, 00:03:52.520 --> 00:03:56.990 the square bracket operator, and we can look up items in the list. 00:03:56.990 --> 00:03:59.300 Sub one, friends, sub one. 00:04:00.330 --> 00:04:03.780 Not surprisingly, using the European elevator rule, 00:04:06.090 --> 00:04:09.130 the first item in a list is sub zero, the second 00:04:09.130 --> 00:04:11.570 item is sub one and the third one is sub two. 00:04:11.570 --> 00:04:15.150 So here when I print friends sub one I get Glenn. 00:04:15.150 --> 00:04:18.420 Which is the second element. Just like strings. 00:04:18.420 --> 00:04:20.630 So once you kind of know it for strings, lists 00:04:20.630 --> 00:04:22.590 and the rest of these things make a lot more sense. 00:04:22.590 --> 00:04:26.060 Just, remember that we're in Europe, and things start with zero. 00:04:27.760 --> 00:04:31.813 Some things in these data items that we work with are not mutable. 00:04:31.813 --> 00:04:34.423 So for example, strings, when we ask for a lower case 00:04:34.423 --> 00:04:37.247 version of a string, we're given a copy of that string. 00:04:37.247 --> 00:04:41.547 And that's because strings are not mutable, and we can see this 00:04:41.547 --> 00:04:46.550 by doing something like saying fruit sub 0 equals lowercase b. 00:04:46.550 --> 00:04:49.620 Now you'd think that that would just change this 00:04:49.620 --> 00:04:53.652 to be a lower case b, but it doesn't, okay? 00:04:53.652 --> 00:04:57.340 It says string object does not support item assignment 00:04:57.340 --> 00:05:00.420 which means that you're not allowed to reassign. 00:05:00.420 --> 00:05:03.200 You can make a new string and put different things in 00:05:03.200 --> 00:05:06.820 that new string, but once the strings are made, they're not changeable. 00:05:06.820 --> 00:05:12.220 And that's why when we call fruit.lower, we get a copy of it in lower case. 00:05:12.220 --> 00:05:14.860 And so x is a copy of the original string, but 00:05:14.860 --> 00:05:18.150 the original string, once we assign it into fruit, is unchanged. 00:05:18.150 --> 00:05:19.080 It can't be changed. 00:05:20.340 --> 00:05:22.380 Lists, on the other hand, can be changed, and we 00:05:22.380 --> 00:05:23.470 can change them in the middle. 00:05:23.470 --> 00:05:26.230 This is one of the things we like about them. 00:05:26.230 --> 00:05:29.320 So here we have a list: 2, 14, 26, 41, and 63. 00:05:29.320 --> 00:05:31.130 Then we say lotto sub two. 00:05:31.130 --> 00:05:33.670 Of course, that's going to be the third item. 00:05:33.670 --> 00:05:35.690 Lotto sub two is equal to 28. 00:05:35.690 --> 00:05:38.380 Then we print it and we see the new number there. 00:05:38.380 --> 00:05:41.190 So all this is saying is that we can change them, right? 00:05:41.190 --> 00:05:44.640 Strings no, and lists yes. 00:05:44.640 --> 00:05:47.540 You can change lists, but you can't change strings. 00:05:49.230 --> 00:05:52.480 So the len function, we've used it for several 00:05:52.480 --> 00:05:55.540 things, we can say you know, use, len is 00:05:55.540 --> 00:05:58.270 used for, for strings and it's used for lists as well. 00:05:58.270 --> 00:06:01.000 So the same function knows when its 00:06:01.040 --> 00:06:03.070 parameter is a string. And when its parameter is a string, 00:06:03.070 --> 00:06:05.030 it gives us the number of characters in the string. 00:06:05.030 --> 00:06:07.390 And when it is a list, it gives us 00:06:07.390 --> 00:06:10.640 the number of elements in the list. 00:06:10.640 --> 00:06:14.310 And just because one of them is a string, it's still one element from the point 00:06:14.310 --> 00:06:15.950 of view of this list. 00:06:15.950 --> 00:06:20.925 So it has one, two, three, four - four items in the list, okay? 00:06:24.870 --> 00:06:27.580 So, the range function is a special function. 00:06:27.580 --> 00:06:30.140 It's probably about time to talk about the range function. 00:06:31.350 --> 00:06:34.350 The range function is a function that generates a list, that 00:06:34.350 --> 00:06:37.210 produces a list and gives it back to us. 00:06:37.210 --> 00:06:38.870 And so you give the range function a 00:06:38.870 --> 00:06:42.170 parameter, how many items you want, and the range 00:06:42.170 --> 00:06:46.150 function creates and gives us back a list that 00:06:46.150 --> 00:06:49.960 is four numbers starting at zero, which is zero 00:06:49.960 --> 00:06:53.970 up to, but not including three. Sound familiar? 00:06:53.970 --> 00:06:54.390 Yeah. 00:06:54.390 --> 00:06:58.460 Zero up to but not, I mean zero up to, but not including four. 00:06:58.460 --> 00:07:04.630 And, and so the same thing is true here. So, we can combine the len and the range 00:07:04.630 --> 00:07:10.071 to say, you know, to say okay, well len friends, that's three 00:07:10.071 --> 00:07:15.400 items, and range len friends is 0, 1, 2. And it also 00:07:15.400 --> 00:07:22.620 corresponds exactly to these items. So we can actually use this 00:07:22.620 --> 00:07:30.940 to construct loops to go through a list. We already have a basic for loop, right? 00:07:30.940 --> 00:07:34.290 We basically have a for loop that is our, 00:07:34.290 --> 00:07:38.670 that, that said that for each friend in friends. 00:07:38.670 --> 00:07:41.220 And out comes, Happy New Year, Glenn and Joseph. 00:07:41.220 --> 00:07:45.070 If we also want to know where, what position we're at as 00:07:45.070 --> 00:07:50.040 the loop progresses, we can rewrite the exact same loop a different way. 00:07:50.040 --> 00:07:52.950 And make i be our iteration variable. 00:07:52.950 --> 00:07:59.250 And say i in range(len(friends)), that turns this into zero, one, two. 00:07:59.250 --> 00:08:01.530 And then i goes zero, one, two. 00:08:01.530 --> 00:08:03.280 And then, we can in the loop, look up the 00:08:03.280 --> 00:08:06.540 particular friend that is the particular one we are interested in, 00:08:06.540 --> 00:08:10.670 using the index operator, friend sub i. 00:08:10.670 --> 00:08:12.280 And then print Happy New Year. 00:08:12.280 --> 00:08:13.660 So these two loops, 00:08:15.830 --> 00:08:20.335 these two loops are equivalent. These, oop, not that one. 00:08:20.335 --> 00:08:25.460 [SOUND] This loop and this loop. This loop is 00:08:25.460 --> 00:08:30.720 preferred, unless you happen to need this value i, which tells you where you're at. 00:08:30.720 --> 00:08:32.490 In case maybe you're going to change something, you're 00:08:32.490 --> 00:08:34.760 going to look through something and then change it. 00:08:34.760 --> 00:08:39.070 So, but, but, for what I've written here, they're exactly equivalent. 00:08:39.070 --> 00:08:41.070 Prefer the simpler one, unless you need 00:08:41.070 --> 00:08:44.370 the more complex one. They both produce the same kind of output. 00:08:46.170 --> 00:08:50.090 We can concatenate lists, much like we concatenate strings, with plus. 00:08:53.300 --> 00:08:59.560 And you can think of the Python operator's looking to its right and to its left and 00:08:59.560 --> 00:09:02.270 saying oh, those are both lists, I know what 00:09:02.270 --> 00:09:04.560 to do with lists, I'm going to put those together. 00:09:04.560 --> 00:09:08.200 And so that produces a two, three-long lists become a six-long 00:09:08.200 --> 00:09:12.100 list with the first one followed by the second one concatenated. 00:09:12.100 --> 00:09:15.710 It didn't hurt the original, a. c is a new list, basically. 00:09:19.040 --> 00:09:22.530 We can also slice lists. Feels a lot like strings, right? 00:09:22.530 --> 00:09:24.030 Everything's kind of like strings. 00:09:24.030 --> 00:09:28.330 For loops like strings, concatenation like strings, and now slicing like strings. 00:09:28.330 --> 00:09:30.020 And it is exactly the same. 00:09:32.300 --> 00:09:37.810 So one up to, but not including. Just remember, up to, but not including. 00:09:37.810 --> 00:09:41.830 the second parameter, is up to but not including, so that starts at the sub one, 00:09:41.830 --> 00:09:47.950 which is the second one up to but not including 3, the third one, so. 00:09:47.950 --> 00:09:50.910 This is 1, 2, and 3 so that's 41 comma 2. 00:09:50.910 --> 00:09:55.320 Starting at the first one, up to but not including the third one. 00:09:58.650 --> 00:10:01.570 We can similarly eliminate the first one, 00:10:01.570 --> 00:10:04.410 so that's up to but not including the fourth one. 00:10:04.410 --> 00:10:08.590 Starting at zero, one, two, three, but not including four. 00:10:08.590 --> 00:10:13.651 So that's this one. If we go three to the end, and again, 00:10:13.651 --> 00:10:21.020 remember that there, starting at 0, so 3 to the end is 0, 1, 2, 3 to the end. 00:10:21.020 --> 00:10:23.540 The number 3 doesn't matter. So that's 3, 74, 15. 00:10:23.540 --> 00:10:24.290 And the 00:10:25.710 --> 00:10:29.300 whole thing, that's the whole thing, so these two things are the same. 00:10:29.300 --> 00:10:33.100 So slicing works like strings, starting and up 00:10:33.100 --> 00:10:34.760 to but not including is the second parameter. 00:10:36.400 --> 00:10:38.570 There are some methods, and you can 00:10:38.570 --> 00:10:43.020 read about these online in the Python documentation. 00:10:43.020 --> 00:10:44.820 We can use the built-in function. 00:10:44.820 --> 00:10:48.140 It doesn't have a lot of use in sort of how 00:10:48.140 --> 00:10:50.590 we run, when we're running programs but it's kind of of useful. 00:10:50.590 --> 00:10:51.890 I like it when I'm typing 00:10:51.890 --> 00:10:54.440 interactively. Like, what can this thing do? 00:10:54.440 --> 00:10:58.120 So I make a list, list is a unique type, and 00:10:58.120 --> 00:11:00.340 I say, with dir I say what can we do with it? 00:11:00.340 --> 00:11:04.170 Well, we can append, we can count, extend, index, insert, pop, remove, reverse 00:11:04.170 --> 00:11:08.300 and sort. And then you can sort of read up on all these things. 00:11:08.300 --> 00:11:13.889 I'll show you just a couple. We can build a list with the append. 00:11:14.900 --> 00:11:16.100 So this syntax here, 00:11:16.100 --> 00:11:19.270 stuff equals list, that's called a constructor 00:11:19.270 --> 00:11:21.060 which says give me an empty list. 00:11:22.440 --> 00:11:26.280 You could also say bracket, bracket for an empty list. 00:11:26.280 --> 00:11:30.060 Whatever, you gotta make an empty list and then you call the append. 00:11:30.060 --> 00:11:33.210 Remember that lists are mutable, so it's okay to change it. 00:11:33.210 --> 00:11:35.530 So we're saying, okay, we started with an empty list. 00:11:35.530 --> 00:11:38.210 Now append to the end of that, the word book. 00:11:38.210 --> 00:11:39.910 And then append to that, 99. 00:11:39.910 --> 00:11:44.040 Wait a sec. 00:11:44.040 --> 00:11:44.860 That's a mistake. 00:11:49.110 --> 00:11:52.350 That's a mistake. So I have to fix this mistake. 00:11:52.350 --> 00:11:55.440 So watch me fix the mistake. Poof. 00:11:57.830 --> 00:12:00.680 Now my thing is magically fixed. Isn't that amazing. 00:12:00.680 --> 00:12:03.960 I have magic powers when it comes to slide fixing. 00:12:03.960 --> 00:12:07.370 I just snap my fingers and the slides are fixed. 00:12:07.370 --> 00:12:07.900 So here we go. 00:12:07.900 --> 00:12:10.220 We append the 99, and we print it out. 00:12:10.220 --> 00:12:13.920 And it's got book and 99, emphasizing the fact that they don't 00:12:13.920 --> 00:12:16.780 have to be the exact same kind of thing in a list. 00:12:16.780 --> 00:12:20.450 Then later we append cookie and then it's book, 99, cookie. 00:12:20.450 --> 00:12:22.910 Okay? So this append, we won't do it in line 00:12:22.910 --> 00:12:25.730 like this so often, we'll tend to do it in a loop as we're building up a 00:12:25.730 --> 00:12:27.370 list, but that's the way you start with 00:12:27.370 --> 00:12:30.630 an empty list and then [SOUND] programmatically grow it. 00:12:33.350 --> 00:12:38.410 We can ask, much like we do in a string, we can ask if an item is in a list. 00:12:38.410 --> 00:12:41.280 So here is a list called some, with these numbers in it. 00:12:41.280 --> 00:12:42.910 It's got five numbers in it. 00:12:42.910 --> 00:12:45.980 Is nine in some? True, yes it is. 00:12:45.980 --> 00:12:48.780 Is 15 in some? False. 00:12:48.780 --> 00:12:55.300 Is 20 not in, that's a leg, a legal syntax, that is legal syntax. 00:12:55.300 --> 00:12:58.280 Is 20 not in some, yes it's not there, okay? 00:12:58.280 --> 00:13:02.910 They don't modify the list, don't modify the list, they're just asking questions. 00:13:02.910 --> 00:13:06.260 These are logical operations often used in if statements or 00:13:06.260 --> 00:13:10.330 while, some kind of a logic that you might be building. 00:13:12.050 --> 00:13:14.990 Okay, so lists have order. 00:13:14.990 --> 00:13:17.130 So when we were appending them, the first thing went 00:13:17.130 --> 00:13:20.730 in first, the second thing went in second, et cetera, et cetera. 00:13:20.730 --> 00:13:23.380 And we can also tell the list to sort itself. 00:13:23.380 --> 00:13:25.650 So one of the things that we can do with a list, 00:13:25.650 --> 00:13:28.780 now we're starting to see some power here, is say, sort yourself. 00:13:28.780 --> 00:13:30.186 This is a list of strings. 00:13:30.186 --> 00:13:33.105 It can sort numbers, it can sort lots of things. 00:13:33.105 --> 00:13:38.550 friends.sort, that says hey there, dear friends, sort yourself. 00:13:38.550 --> 00:13:40.080 This makes a change. 00:13:42.540 --> 00:13:44.670 It alters the list, and puts it, in 00:13:44.670 --> 00:13:48.010 this case, in alphabetical order, Glenn, Joseph, and Sally. 00:13:48.010 --> 00:13:51.780 It is muted, it was, it's, it's been modified, and so 00:13:51.780 --> 00:13:54.660 friend sub one is now Joseph because that's the second one. 00:13:54.660 --> 00:13:55.850 Okay? 00:13:55.850 --> 00:14:00.000 So the sort method says sort yourself now, 00:14:00.000 --> 00:14:03.680 sort yourself, and it sorts and then it stays sorted. 00:14:06.720 --> 00:14:10.590 So [COUGH] 00:14:10.590 --> 00:14:13.260 you're going to be kind of ticked about this particular slide. 00:14:13.260 --> 00:14:16.790 Because there's a whole bunch of built-in functions that help with lists. 00:14:16.790 --> 00:14:22.260 And, there's max, there's min, there's len, various things. 00:14:22.260 --> 00:14:24.520 And so we could, all those loops that I told you how to 00:14:24.520 --> 00:14:29.646 do, I was just showing you that stuff because I thought it was important. 00:14:29.646 --> 00:14:31.854 This the simplest way to go through and 00:14:31.854 --> 00:14:35.230 find the largest, smallest, and sum, et cetera. 00:14:35.230 --> 00:14:36.860 So here's a list of numbers. 00:14:38.150 --> 00:14:39.560 We can say how many are there. 00:14:39.560 --> 00:14:43.060 That's the count. We can say what's the largest, it's 74. 00:14:43.060 --> 00:14:45.960 What's the smallest, that'd be 3. 00:14:45.960 --> 00:14:49.080 What is the sum of the running total of them all? 154. 00:14:49.080 --> 00:14:52.310 If you remember from a few lectures ago, these are the same numbers. 00:14:52.310 --> 00:14:56.880 And what is the average, which is, sum of them over the length of them, 00:14:56.880 --> 00:14:58.120 Okay? 00:14:58.120 --> 00:15:00.960 So this makes a lot more sense and if you had a list of numbers 00:15:00.960 --> 00:15:04.506 like this, you would simply say what's the max, you wouldn't write a max loop. 00:15:04.506 --> 00:15:06.945 I just did that to kind of demonstrate how loops work. 00:15:06.945 --> 00:15:09.590 [COUGH] Demonstrate how loops work. 00:15:09.590 --> 00:15:12.360 So here is a way that you can sort 00:15:12.360 --> 00:15:16.580 of change those kind of programs that we wrote. 00:15:16.580 --> 00:15:19.780 So there's two ways to write a summing program. 00:15:19.780 --> 00:15:22.100 Let's just say instead of the data being 00:15:22.100 --> 00:15:26.370 in a list, we're going to write a while loop that's going to read a 00:15:26.370 --> 00:15:31.250 set of numbers until we say done, and then compute the average of those numbers. 00:15:31.250 --> 00:15:32.728 Okay, so let's say this is our problem. 00:15:32.728 --> 00:15:38.220 Read a list of numbers, wait till the word done comes in, and then average them. 00:15:38.220 --> 00:15:40.450 So here's a little program that does that. 00:15:40.450 --> 00:15:43.250 We create total equals zero, count equals zero. 00:15:43.250 --> 00:15:46.120 Make a infinite loop with while True. 00:15:46.120 --> 00:15:47.520 And then we ask 00:15:47.520 --> 00:15:48.810 to enter a number. 00:15:48.810 --> 00:15:51.750 We get a string back from this, remember raw_input always 00:15:51.750 --> 00:15:56.790 gives us strings back, and then if it's done, we're going to break. 00:15:56.790 --> 00:15:59.770 This is the version of the if that does not require an indent. 00:15:59.770 --> 00:16:01.570 We just put the break up there. 00:16:01.570 --> 00:16:04.080 And so that gets us out of the loop when the time is right. 00:16:04.080 --> 00:16:06.020 So when the time is right over here. 00:16:06.020 --> 00:16:09.810 And then, we convert the value to float. 00:16:09.810 --> 00:16:12.830 We use a float to convert the input to a floating point number. 00:16:12.830 --> 00:16:15.130 And then we do our accumulation pattern, 00:16:15.130 --> 00:16:18.110 total equals total plus value, count equals count plus one. 00:16:18.110 --> 00:16:19.070 So this is going to run. 00:16:19.070 --> 00:16:21.230 These numbers are going to go up and up and up and up. 00:16:21.230 --> 00:16:22.880 And then we're going to break out of it, 00:16:22.880 --> 00:16:25.980 calculate the average, and then print the average. 00:16:25.980 --> 00:16:29.850 Because that's a floating point number, so now the average is a floating point number. 00:16:29.850 --> 00:16:31.070 So that's one way to do it. 00:16:31.070 --> 00:16:31.390 Right? 00:16:31.390 --> 00:16:34.570 That would be one way to write a program 00:16:34.570 --> 00:16:37.990 that does an average, is keep a running average 00:16:37.990 --> 00:16:38.999 as you're reading the numbers. 00:16:40.060 --> 00:16:44.080 But there's another way to do it, that would exact, work exactly 00:16:44.080 --> 00:16:47.508 the same way, and this is when you can start using lists. 00:16:47.508 --> 00:16:51.560 So you come in, you say I'm going to make a list 00:16:51.560 --> 00:16:56.810 of numbers, just a mnemonic name, numlist, is an empty list. 00:16:56.810 --> 00:17:02.070 Then I create another infinite loop that's going to read for enter a number. 00:17:02.070 --> 00:17:03.460 And if it's done, break. 00:17:03.460 --> 00:17:08.650 That gets us out of it. Convert the value to an int. 00:17:08.650 --> 00:17:12.400 Convert the value to a float, the input value to a float. 00:17:12.400 --> 00:17:14.440 And then append it to the list. 00:17:14.440 --> 00:17:16.579 So now the list is going to grow, each time 00:17:16.579 --> 00:17:18.819 we read a number the list is going to grow. 00:17:18.819 --> 00:17:21.420 However many times we add the number is 00:17:21.420 --> 00:17:23.410 how many things are going to be in the list. 00:17:23.410 --> 00:17:25.730 So in this case, when we're at this point and we 00:17:25.730 --> 00:17:28.540 type done, there will be three numbers in the list, because we 00:17:28.540 --> 00:17:32.560 will have run append three times. We'll have appended 3, 9, and 5. 00:17:32.560 --> 00:17:37.160 We'll have them sitting in a list. And we will have exited the loop. 00:17:37.160 --> 00:17:39.360 So now you say, oh add up all the numbers in 00:17:39.360 --> 00:17:42.720 that list, and then divide it by the length of the list. 00:17:42.720 --> 00:17:43.960 And print the average. 00:17:43.960 --> 00:17:47.290 So these two programs are basically equivalent. 00:17:47.290 --> 00:17:48.620 The only time that they might not be 00:17:48.620 --> 00:17:54.120 equivalent was if there was ten million numbers. 00:17:54.120 --> 00:17:59.260 This would use up 40 megabytes of your memory, which 00:17:59.260 --> 00:18:01.230 is actually not a lot of memory on some computers. 00:18:01.230 --> 00:18:05.180 But if memory mattered, this does store all those numbers. 00:18:05.180 --> 00:18:07.680 This one actually just runs the calculation. 00:18:07.680 --> 00:18:11.660 So if there's a really large number of numbers, this would make a difference, 00:18:11.660 --> 00:18:15.660 because the list is growing and keeping them all, summing them all at the end. 00:18:15.660 --> 00:18:17.350 This is actually storing very little data. 00:18:18.430 --> 00:18:20.600 But for reasonably sized numbers, 00:18:20.600 --> 00:18:24.120 like thousands or even hundreds of thousands of numbers, these 00:18:24.120 --> 00:18:28.960 two approaches are kind of equivalent. And then sometimes you actually 00:18:28.960 --> 00:18:32.070 want to accumulate something a little more complex than this, you want to 00:18:32.070 --> 00:18:35.320 sort them or look for the maximum and look for something else. 00:18:35.320 --> 00:18:37.430 Who knows what, but the notion of make a 00:18:37.430 --> 00:18:39.830 list and then append something to the list 00:18:39.830 --> 00:18:42.380 each time through the iteration, and then do something with 00:18:42.380 --> 00:18:45.410 the list at the end is a rather powerful pattern. 00:18:45.410 --> 00:18:48.720 So this is also a powerful pattern, this is accumulator 00:18:48.720 --> 00:18:51.900 pattern where we just have the variables accumulating in the loop. 00:18:51.900 --> 00:18:55.040 This one is one where we accumulate the data in 00:18:55.040 --> 00:18:58.170 the loop and then do the computations all at the end. 00:18:58.170 --> 00:19:02.050 The, certain situations will make use of these different techniques. 00:19:03.130 --> 00:19:09.020 Okay. So, connecting strings and lists. 00:19:09.020 --> 00:19:11.830 So there's a method, a capability 00:19:11.830 --> 00:19:16.190 of strings that is really powerful when it comes to tearing data apart. 00:19:18.880 --> 00:19:23.110 It's called the split. So here is a string 00:19:23.110 --> 00:19:26.858 with three words and it has blanks in between here. 00:19:26.858 --> 00:19:33.720 And abc.split says parse this string, 00:19:33.720 --> 00:19:38.690 look for the blanks, break the string into pieces, and give me back a 00:19:38.690 --> 00:19:43.920 list with one item for each of the words in the list as 00:19:43.920 --> 00:19:47.200 defined by the spaces. Okay? 00:19:47.200 --> 00:19:53.150 So, it takes, breaks it into three pieces and gives us that back in a list. 00:19:53.150 --> 00:19:55.870 This is very powerful. Okay? 00:19:55.870 --> 00:19:58.340 So we're going to split it and we get back a list. 00:19:58.340 --> 00:20:04.180 There are three words, and the first word, stuff sub zero, is With. 00:20:04.180 --> 00:20:06.200 So there's a lot of parsing going on here. 00:20:06.200 --> 00:20:09.180 We could do this with for loops and a lot of other things. 00:20:09.180 --> 00:20:11.240 There would be a lot of work in this split. 00:20:11.240 --> 00:20:14.180 Given that this is a really common task, it's really 00:20:14.180 --> 00:20:17.970 great that this has been put into Python for us. 00:20:17.970 --> 00:20:19.350 Okay? 00:20:19.350 --> 00:20:22.850 So split breaks a string into parts and produces a list of strings. 00:20:22.850 --> 00:20:25.630 We think of these as words, we can access a 00:20:25.630 --> 00:20:28.040 particular word or we can loop through all the words. 00:20:28.040 --> 00:20:31.050 So here we have stuff again and now we have a, a for loop 00:20:32.050 --> 00:20:35.070 for each of the, that's going to go through each of the three words. 00:20:35.070 --> 00:20:36.370 And then it's going to run three times. 00:20:36.370 --> 00:20:37.410 Now chances are good we're going to do 00:20:37.410 --> 00:20:39.600 something different other than just print them out. 00:20:39.600 --> 00:20:44.450 But you see how that you quickly can take a split followed by a for, and then write 00:20:44.450 --> 00:20:45.720 a loop that's going to go through each of the 00:20:45.720 --> 00:20:48.360 words, without working too hard to find the spaces. 00:20:48.360 --> 00:20:52.574 You let Python do all the hard work of finding the spaces. 00:20:52.574 --> 00:20:53.375 Okay? 00:20:53.375 --> 00:20:56.350 So let's take a look at a couple of samples. 00:20:58.130 --> 00:21:00.480 Just a couple of things to teach you a little more about split. 00:21:01.510 --> 00:21:05.570 Split looks at many spaces as equal to one space. 00:21:07.500 --> 00:21:10.810 So, if you split a lot blank, blank, blank of spaces, it's 00:21:10.810 --> 00:21:14.480 still just throws away all the spaces and gives us four words. 00:21:15.750 --> 00:21:20.480 One, two, three, four and throws away all the spaces, 00:21:20.480 --> 00:21:21.900 because it assumes that's what we want done. 00:21:21.900 --> 00:21:22.535 So that's nice. 00:21:22.535 --> 00:21:26.916 You can also have split, you can also have split, 00:21:26.916 --> 00:21:30.310 split on some other character. Sometimes you'll be getting data 00:21:30.310 --> 00:21:33.090 and they'll have used a semicolon, or a comma, or 00:21:33.090 --> 00:21:36.000 a colon, or a tab character, who knows what they've 00:21:36.000 --> 00:21:39.400 used, and your job is to dig that data out. 00:21:39.400 --> 00:21:42.900 So you can split, based on the different character. 00:21:42.900 --> 00:21:47.070 Here, if we're splitting normally with, with this is a normal split. 00:21:47.070 --> 00:21:49.800 It's not going to see the semicolons, it's looking for a space. 00:21:49.800 --> 00:21:52.880 And so all we get back is one 00:21:52.880 --> 00:21:55.220 item in the string, with the semicolons. 00:21:55.220 --> 00:21:58.520 But, if we switch, and we pass semicolon 00:21:58.520 --> 00:22:01.080 as a parameter, in as as parameter to split, 00:22:01.080 --> 00:22:03.090 then it will know to split it based on 00:22:03.090 --> 00:22:06.450 semicolons, and gives us first, second, and third back. 00:22:07.520 --> 00:22:07.820 Okay? 00:22:07.820 --> 00:22:09.940 And then it gives us three words. 00:22:09.940 --> 00:22:13.640 So you can split either on spaces, or you 00:22:13.640 --> 00:22:17.490 can split on a character other than a space. 00:22:17.490 --> 00:22:18.040 Okay? 00:22:18.040 --> 00:22:20.400 [COUGH] 00:22:20.400 --> 00:22:25.230 So, let's take a look at how we might turn this into some of our common assignments 00:22:25.230 --> 00:22:32.420 that we have in this chapter, where we're going to read some of the mailbox data. Okay? 00:22:33.420 --> 00:22:36.720 So, here we go with a little program. 00:22:36.720 --> 00:22:41.170 First three lines, we write these a lot. Open the file. 00:22:41.170 --> 00:22:43.090 Write a for loop to loop through each 00:22:43.090 --> 00:22:44.870 line in the file. 00:22:44.870 --> 00:22:48.100 Then we're going to strip off the white space at the end of the line. 00:22:48.100 --> 00:22:50.990 One, two, three. Do those all the time. 00:22:50.990 --> 00:22:54.990 And we're looking for lines, if you look at the whole file, 00:22:54.990 --> 00:22:58.170 we're looking for lines that start with from, followed by a space. 00:22:58.170 --> 00:23:00.420 So if the line does not start with from 00:23:00.420 --> 00:23:03.700 followed by a space, that's a space right there, continue. 00:23:03.700 --> 00:23:08.460 So that's a way to skip all the lines that don't look like this. 00:23:08.460 --> 00:23:12.490 There're thousands of lines in this file and just a few that look like this. Okay? 00:23:12.490 --> 00:23:17.110 So we're going to look and we're going to try 00:23:17.110 --> 00:23:22.790 to find what day of the week this thing happened on. 00:23:22.790 --> 00:23:27.700 So, so we're throwing away all the lines with this little bit of code. 00:23:27.700 --> 00:23:32.820 Then what we do is we take the line, which is all of this text, and then we split it. 00:23:34.110 --> 00:23:38.270 And we know that the day of the week is words sub two. 00:23:38.270 --> 00:23:43.080 So this is words sub zero, this is words sub one, and this is words sub two. 00:23:43.080 --> 00:23:46.480 So this is words sub zero, sub one, and sub two. 00:23:46.480 --> 00:23:48.550 And so, all we have to do is print out the sub two 00:23:48.550 --> 00:23:53.740 and we get, we throw away all the lines except the from lines. 00:23:53.740 --> 00:23:56.650 We split them and take the sec, uh, the, 00:23:56.650 --> 00:23:59.330 the third word or words sub two and we 00:23:59.330 --> 00:24:02.260 can quickly quickly create something that's extracting 00:24:02.260 --> 00:24:04.060 the day of the week out of these. 00:24:06.030 --> 00:24:07.400 Okay? 00:24:07.400 --> 00:24:11.890 So it's, it's, I mean, it's quick, because split does the tricky work. 00:24:11.890 --> 00:24:15.140 If you go back to the strings chapter, you saw that 00:24:15.140 --> 00:24:16.910 we did a lot of work to get this to happen. 00:24:17.950 --> 00:24:21.040 So here's even another tricky pattern. 00:24:21.040 --> 00:24:26.510 So let's say we want to do what we did at the end of Chapter Six, 00:24:26.510 --> 00:24:28.120 the string chapter. 00:24:28.120 --> 00:24:30.870 Let's say we wanted to get back this little bit of data. 00:24:32.130 --> 00:24:33.330 Okay? 00:24:33.330 --> 00:24:37.310 So, can look at this and say, okay, let's split this. 00:24:37.310 --> 00:24:42.420 And this will be zero, one, and two, and three, and four, and five, and six. 00:24:42.420 --> 00:24:44.530 We're splitting it based on spaces. 00:24:44.530 --> 00:24:50.106 Then the email address is words sub one, right? 00:24:51.106 --> 00:24:54.666 So that email address is this little bit of stuff 00:24:54.666 --> 00:24:58.780 because it's in between spaces, right? So that's what we pull out. 00:24:58.780 --> 00:25:02.355 The email address is words sub one. 00:25:02.355 --> 00:25:04.512 We've got that. 00:25:04.512 --> 00:25:07.730 So that's sitting in this email address variable. 00:25:07.730 --> 00:25:10.000 Then we really, all we want, we don't really want the whole thing, 00:25:10.000 --> 00:25:11.960 we just want the part after the 00:25:11.960 --> 00:25:14.470 at sign, and we can do a lookup for the, oop. 00:25:14.470 --> 00:25:16.290 We can do a lookup of the at sign. 00:25:17.490 --> 00:25:22.145 But you can also then do a second, come back, come back. 00:25:22.145 --> 00:25:25.300 [SOUND] There we come. 00:25:25.300 --> 00:25:29.110 You can also do a second split. Okay? 00:25:29.110 --> 00:25:31.260 So we're taking this variable here, email, 00:25:31.260 --> 00:25:33.980 which is merely this little part right here. 00:25:33.980 --> 00:25:36.840 And we are splitting it again, except this 00:25:36.840 --> 00:25:38.400 time we're splitting it based on a at sign. 00:25:38.400 --> 00:25:42.640 Which means it's going to bust it right here, and find 00:25:42.640 --> 00:25:44.140 us two pieces. 00:25:44.140 --> 00:25:49.730 So pieces now is a list where the sub zero item is the 00:25:49.730 --> 00:25:56.280 person's name and sub one item is the host that their mail address is held from. 00:25:56.280 --> 00:26:00.540 Okay? And so then all we need to know is pieces 00:26:00.540 --> 00:26:06.380 is sub one, and pieces sub one is this guy right here. 00:26:07.900 --> 00:26:10.750 So that's pieces sub one, and so we pulled it out. 00:26:10.750 --> 00:26:13.470 So if you go back to how we did it before, we were 00:26:13.470 --> 00:26:17.100 doing searching, we were searching some more, and then we were taking slices. 00:26:17.100 --> 00:26:19.380 This is a little more elegant, okay? 00:26:19.380 --> 00:26:21.110 Because really, we split it and then we split it, 00:26:21.110 --> 00:26:23.080 and we knew what piece we were looking at. 00:26:23.080 --> 00:26:27.250 So this is what I call the Double Split Pattern, where you split a string 00:26:27.250 --> 00:26:30.630 into a list, then you take a thing out, and then you split it again. 00:26:31.710 --> 00:26:33.020 Depending on what data you're looking for. 00:26:33.020 --> 00:26:35.376 This is just a technique, it's not the only technique. 00:26:35.376 --> 00:26:40.480 Okay, so that's lists. 00:26:40.480 --> 00:26:42.040 We talked about the concept of a 00:26:42.040 --> 00:26:44.540 collection where lists have multiple things in it. 00:26:44.540 --> 00:26:47.350 Definite loops, again, we've seen these things. 00:26:47.350 --> 00:26:49.600 We're kind of, it looks a lot like strings 00:26:49.600 --> 00:26:53.100 except the elements are more powerful and they're more mutable. 00:26:53.100 --> 00:26:59.070 We still use the bracket operator and we redid the max, min, and sum. 00:26:59.070 --> 00:27:02.382 Except we did it in, like, one line rather than a whole loop. 00:27:02.382 --> 00:27:06.110 And something we're going to play with a lot is using split to parse strings, 00:27:06.110 --> 00:27:08.630 the single split, and then the double split 00:27:08.630 --> 00:27:11.130 is the natural extension of the single split. 00:27:11.130 --> 00:27:14.780 So, see you in the next lecture, looking forward to talking about dictionaries.