0:00:00.160,0:00:04.530 Hello, and welcome to Chapter Eight:[br]Python Lists. 0:00:04.530,0:00:08.400 So now we're sort of going to start taking[br]care of business. 0:00:08.400,0:00:10.530 We are doing, make lists and 0:00:10.530,0:00:13.280 dictionaries and tuples and really start[br]manipulating this data, 0:00:13.280,0:00:16.290 and doing real data analysis,[br]starting the, 0:00:16.290,0:00:18.260 laying the proper work for real data[br]analysis. 0:00:18.260,0:00:21.950 As always, these lectures, audio, video,[br]slides, 0:00:21.950,0:00:25.740 and even book are copyright Creative Commons[br]Attribution. 0:00:25.740,0:00:31.030 So, lists, dictionaries, and tuples, the[br]next real three big topics we're going to 0:00:31.030,0:00:36.270 talk about, are collections.[br]And we've been doing lists already, right? 0:00:37.340,0:00:41.060 We've been doing lists when we were doing[br]for loops. 0:00:41.060,0:00:44.000 A list in Python is something that has a[br]square braces. 0:00:44.000,0:00:45.420 This is a constant list. 0:00:46.550,0:00:48.410 Now, when I first talked to you 0:00:48.410,0:00:50.530 about variables, I sort of oversimplified[br]things. 0:00:50.530,0:00:50.900 I said 0:00:50.900,0:00:54.160 if you put like x equals two, and then put 0:00:54.160,0:00:57.540 x equals four, the two and the four[br]overwrite each other. 0:00:57.540,0:01:01.890 A collection is where you can put a bunch[br]of things in the same variable. 0:01:01.890,0:01:04.129 Now, I have to have a way to find those[br]things. 0:01:05.570,0:01:08.820 But it allows us to put multiple things in 0:01:08.820,0:01:11.810 more, more things, more than one thing in[br]the variable. 0:01:11.810,0:01:15.330 So, here we have friends, that has three[br]strings, Joseph, Glenn, and Sally. 0:01:15.330,0:01:15.970 And we have carryon 0:01:15.970,0:01:20.000 that has socks, shirt, and perfume.[br]So that's the basic idea. 0:01:20.000,0:01:21.680 So what's not a collection? 0:01:21.680,0:01:23.440 Well, simple variables. 0:01:23.440,0:01:26.610 Simple variables are not collections, just[br]like this example. 0:01:26.610,0:01:30.190 I say x equals 2, x equals 4, and print x, 0:01:30.190,0:01:33.430 and the 4's in there and the 2 is somehow[br]gone. 0:01:33.430,0:01:35.570 It was there for a moment, and then it's[br]gone. 0:01:36.740,0:01:38.470 And so that's a normal variable. 0:01:38.470,0:01:41.490 They're not collections.[br]You can't put more than one thing in it. 0:01:41.490,0:01:44.220 But when you put more than one thing in[br]it, then you 0:01:44.220,0:01:46.530 have to have a way to find the things that[br]are in there. 0:01:46.530,0:01:47.320 We'll, we'll get to that. 0:01:49.260,0:01:51.880 So, we've been using list constants for[br]the last couple 0:01:51.880,0:01:55.120 of chapters just because we have to use[br]list constants. 0:01:55.120,0:01:59.040 You know, so we used, in the for loop[br]chapter, we did lists of numbers. 0:02:00.520,0:02:05.000 We have done lists of strings, that's[br]strings, red, yellow, and blue. 0:02:06.460,0:02:11.230 And you don't have to necessarily, you[br]don't necessarily 0:02:11.230,0:02:13.540 have to have things all of the same type. 0:02:13.540,0:02:17.680 This is a three-item list, that has[br]a string red, 0:02:17.680,0:02:22.800 the number integer 24, and 98.6, which is[br]a floating point number. 0:02:22.800,0:02:25.810 And here's an interesting thing, just as a[br]side note. 0:02:25.810,0:02:28.040 This shows that floating point numbers are 0:02:28.040,0:02:32.040 not always perfectly represented inside of[br]the computer. 0:02:32.040,0:02:34.590 It's sort of an artifact of how they work. 0:02:34.590,0:02:36.880 And this is an example of 98.6 is really[br]98 point 0:02:36.880,0:02:38.980 na, na, na, na, na. 0:02:38.980,0:02:41.260 So, but, don't, when you see something[br]like that, don't freak out. 0:02:41.260,0:02:43.710 Floating point numbers are the ones that[br]show this behavior. 0:02:44.760,0:02:48.340 So, interestingly, you can always,[br]although we won't put a lot of energy into 0:02:48.340,0:02:52.930 this, you can also have an element of a[br]list be a list itself. 0:02:52.930,0:02:55.630 So this a outer list that's got three[br]elements. 0:02:55.630,0:02:57.710 1, 7, and then 0:02:57.710,0:02:59.860 a list that's 5 and 6. 0:02:59.860,0:03:04.470 So, if you look at the length of this,[br]there is three things in it. 0:03:04.470,0:03:05.850 Not four, three. 0:03:05.850,0:03:08.520 Because the outer list has 1, 2, 3 things[br]in it. 0:03:08.520,0:03:12.480 And an empty list is bracket, bracket. 0:03:12.480,0:03:13.340 Okay? 0:03:13.340,0:03:17.180 Like I said, we have been going through[br]lists all along. 0:03:17.180,0:03:19.660 We have iteration variables for i in. 0:03:19.660,0:03:22.205 This is a list.[br]We've been using it all along. 0:03:22.205,0:03:27.270 Similarly, we've been using lists in[br]definite loops, are a 0:03:27.270,0:03:30.340 great way to go through lists, for friend[br]in friends, there we have 0:03:30.340,0:03:34.402 goes through three times, out come[br]three lines, with the 0:03:34.402,0:03:38.520 variable friend advancing through the[br]three successive items in the list. 0:03:38.520,0:03:40.380 And away we go. 0:03:40.380,0:03:44.116 So, again, lists are not completely[br]foreign to us. 0:03:44.116,0:03:45.541 Now, 0:03:45.541,0:03:52.520 just like in a string, we can use the[br]index operator, 0:03:52.520,0:03:56.990 the square bracket operator, and[br]we can look up items in the list. 0:03:56.990,0:03:59.300 Sub one, friends, sub one. 0:04:00.330,0:04:03.780 Not surprisingly, using the European[br]elevator rule, 0:04:06.090,0:04:09.130 the first item in a list is sub zero,[br]the second 0:04:09.130,0:04:11.570 item is sub one and the third one is sub[br]two. 0:04:11.570,0:04:15.150 So here when I print friends sub one I[br]get Glenn. 0:04:15.150,0:04:18.420 Which is the second element.[br]Just like strings. 0:04:18.420,0:04:20.630 So once you kind of know it for strings,[br]lists 0:04:20.630,0:04:22.590 and the rest of these things make a lot[br]more sense. 0:04:22.590,0:04:26.060 Just, remember that we're in Europe, and[br]things start with zero. 0:04:27.760,0:04:31.813 Some things in these data items that we[br]work with are not mutable. 0:04:31.813,0:04:34.423 So for example, strings, when we ask for a[br]lower case 0:04:34.423,0:04:37.247 version of a string, we're given a copy of[br]that string. 0:04:37.247,0:04:41.547 And that's because strings are not[br]mutable, and we can see this 0:04:41.547,0:04:46.550 by doing something like saying fruit[br]sub 0 equals lowercase b. 0:04:46.550,0:04:49.620 Now you'd think that that would just[br]change this 0:04:49.620,0:04:53.652 to be a lower case b, but it doesn't,[br]okay? 0:04:53.652,0:04:57.340 It says string object does not support[br]item assignment 0:04:57.340,0:05:00.420 which means that you're not allowed to[br]reassign. 0:05:00.420,0:05:03.200 You can make a new string and put[br]different things in 0:05:03.200,0:05:06.820 that new string, but once the strings are[br]made, they're not changeable. 0:05:06.820,0:05:12.220 And that's why when we call fruit.lower, we[br]get a copy of it in lower case. 0:05:12.220,0:05:14.860 And so x is a copy of the original[br]string, but 0:05:14.860,0:05:18.150 the original string, once we assign it[br]into fruit, is unchanged. 0:05:18.150,0:05:19.080 It can't be changed. 0:05:20.340,0:05:22.380 Lists, on the other hand, can be changed,[br]and we 0:05:22.380,0:05:23.470 can change them in the middle. 0:05:23.470,0:05:26.230 This is one of the things we like about[br]them. 0:05:26.230,0:05:29.320 So here we have a list: 2, 14, 26, 41, and[br]63. 0:05:29.320,0:05:31.130 Then we say lotto sub two. 0:05:31.130,0:05:33.670 Of course, that's going to be the third[br]item. 0:05:33.670,0:05:35.690 Lotto sub two is equal to 28. 0:05:35.690,0:05:38.380 Then we print it and we see the new number[br]there. 0:05:38.380,0:05:41.190 So all this is saying is that we can[br]change them, right? 0:05:41.190,0:05:44.640 Strings no, and lists yes. 0:05:44.640,0:05:47.540 You can change lists, but you can't change[br]strings. 0:05:49.230,0:05:52.480 So the len function, we've used it for[br]several 0:05:52.480,0:05:55.540 things, we can say you know, use, len is 0:05:55.540,0:05:58.270 used for, for strings and it's used for[br]lists as well. 0:05:58.270,0:06:01.000 So the same function knows [br]when its 0:06:01.040,0:06:03.070 parameter is a string. And when its[br]parameter is a string, 0:06:03.070,0:06:05.030 it gives us the number of characters[br]in the string. 0:06:05.030,0:06:07.390 And when it is a list, it gives us 0:06:07.390,0:06:10.640 the number of elements in the list. 0:06:10.640,0:06:14.310 And just because one of them is a string,[br]it's still one element from the point 0:06:14.310,0:06:15.950 of view of this list. 0:06:15.950,0:06:20.925 So it has one, two, three, four - four[br]items in the list, okay? 0:06:24.870,0:06:27.580 So, the range function is a special[br]function. 0:06:27.580,0:06:30.140 It's probably about time to talk about the[br]range function. 0:06:31.350,0:06:34.350 The range function is a function that[br]generates a list, that 0:06:34.350,0:06:37.210 produces a list and gives it back to us. 0:06:37.210,0:06:38.870 And so you give the range function a 0:06:38.870,0:06:42.170 parameter, how many items you want, and[br]the range 0:06:42.170,0:06:46.150 function creates and gives us back a list[br]that 0:06:46.150,0:06:49.960 is four numbers starting at zero, which is[br]zero 0:06:49.960,0:06:53.970 up to, but not including three.[br]Sound familiar? 0:06:53.970,0:06:54.390 Yeah. 0:06:54.390,0:06:58.460 Zero up to but not, I mean zero up to, but[br]not including four. 0:06:58.460,0:07:04.630 And, and so the same thing is true here.[br]So, we can combine the len and the range 0:07:04.630,0:07:10.071 to say, you know, to say okay, well len[br]friends, that's three 0:07:10.071,0:07:15.400 items, and range len friends is 0, 1, 2.[br]And it also 0:07:15.400,0:07:22.620 corresponds exactly to these items.[br]So we can actually use this 0:07:22.620,0:07:30.940 to construct loops to go through a list.[br]We already have a basic for loop, right? 0:07:30.940,0:07:34.290 We basically have a for loop that is our, 0:07:34.290,0:07:38.670 that, that said that for each friend in[br]friends. 0:07:38.670,0:07:41.220 And out comes, Happy New Year, Glenn and[br]Joseph. 0:07:41.220,0:07:45.070 If we also want to know where, what[br]position we're at as 0:07:45.070,0:07:50.040 the loop progresses, we can rewrite the[br]exact same loop a different way. 0:07:50.040,0:07:52.950 And make i be our iteration variable. 0:07:52.950,0:07:59.250 And say i in range(len(friends)), that[br]turns this into zero, one, two. 0:07:59.250,0:08:01.530 And then i goes zero, one, two. 0:08:01.530,0:08:03.280 And then, we can in the loop, look up the 0:08:03.280,0:08:06.540 particular friend that is the particular[br]one we are interested in, 0:08:06.540,0:08:10.670 using the index operator, friend sub i. 0:08:10.670,0:08:12.280 And then print Happy New Year. 0:08:12.280,0:08:13.660 So these two loops, 0:08:15.830,0:08:20.335 these two loops are equivalent.[br]These, oop, not that one. 0:08:20.335,0:08:25.460 [SOUND] This loop and this loop.[br]This loop is 0:08:25.460,0:08:30.720 preferred, unless you happen to need this[br]value i, which tells you where you're at. 0:08:30.720,0:08:32.490 In case maybe you're going to change[br]something, you're 0:08:32.490,0:08:34.760 going to look through something and then[br]change it. 0:08:34.760,0:08:39.070 So, but, but, for what I've written here,[br]they're exactly equivalent. 0:08:39.070,0:08:41.070 Prefer the simpler one, unless you need 0:08:41.070,0:08:44.370 the more complex one.[br]They both produce the same kind of output. 0:08:46.170,0:08:50.090 We can concatenate lists, much like we[br]concatenate strings, with plus. 0:08:53.300,0:08:59.560 And you can think of the Python operator's[br]looking to its right and to its left and 0:08:59.560,0:09:02.270 saying oh, those are both lists, I know[br]what 0:09:02.270,0:09:04.560 to do with lists, I'm going to put those[br]together. 0:09:04.560,0:09:08.200 And so that produces a two, three-long[br]lists become a six-long 0:09:08.200,0:09:12.100 list with the first one followed by[br]the second one concatenated. 0:09:12.100,0:09:15.710 It didn't hurt the original, a. c is a new[br]list, basically. 0:09:19.040,0:09:22.530 We can also slice lists.[br]Feels a lot like strings, right? 0:09:22.530,0:09:24.030 Everything's kind of like strings. 0:09:24.030,0:09:28.330 For loops like strings, concatenation like[br]strings, and now slicing like strings. 0:09:28.330,0:09:30.020 And it is exactly the same. 0:09:32.300,0:09:37.810 So one up to, but not including.[br]Just remember, up to, but not including. 0:09:37.810,0:09:41.830 the second parameter, is up to but not[br]including, so that starts at the sub one, 0:09:41.830,0:09:47.950 which is the second one up to but not[br]including 3, the third one, so. 0:09:47.950,0:09:50.910 This is 1, 2, and 3 so that's 41 comma 2. 0:09:50.910,0:09:55.320 Starting at the first one, up to but not[br]including the third one. 0:09:58.650,0:10:01.570 We can similarly eliminate the first one, 0:10:01.570,0:10:04.410 so that's up to but not including the fourth[br]one. 0:10:04.410,0:10:08.590 Starting at zero, one, two, three, but not[br]including four. 0:10:08.590,0:10:13.651 So that's this one.[br]If we go three to the end, and again, 0:10:13.651,0:10:21.020 remember that there, starting at 0, so 3[br]to the end is 0, 1, 2, 3 to the end. 0:10:21.020,0:10:23.540 The number 3 doesn't matter.[br]So that's 3, 74, 15. 0:10:23.540,0:10:24.290 And the 0:10:25.710,0:10:29.300 whole thing, that's the whole thing, so[br]these two things are the same. 0:10:29.300,0:10:33.100 So slicing works like strings, starting[br]and up 0:10:33.100,0:10:34.760 to but not including is the second[br]parameter. 0:10:36.400,0:10:38.570 There are some methods, and you can 0:10:38.570,0:10:43.020 read about these online in the Python[br]documentation. 0:10:43.020,0:10:44.820 We can use the built-in function. 0:10:44.820,0:10:48.140 It doesn't have a lot of use in sort of how 0:10:48.140,0:10:50.590 we run, when we're running programs but[br]it's kind of of useful. 0:10:50.590,0:10:51.890 I like it when I'm typing 0:10:51.890,0:10:54.440 interactively. Like, what can this thing do? 0:10:54.440,0:10:58.120 So I make a list, list is a unique type, and 0:10:58.120,0:11:00.340 I say, with dir I say what can we do with it? 0:11:00.340,0:11:04.170 Well, we can append, we can count, extend,[br]index, insert, pop, remove, reverse 0:11:04.170,0:11:08.300 and sort. And then you can sort of read up[br]on all these things. 0:11:08.300,0:11:13.889 I'll show you just a couple.[br]We can build a list with the append. 0:11:14.900,0:11:16.100 So this syntax here, 0:11:16.100,0:11:19.270 stuff equals list, that's called a[br]constructor 0:11:19.270,0:11:21.060 which says give me an empty list. 0:11:22.440,0:11:26.280 You could also say bracket, bracket for an[br]empty list. 0:11:26.280,0:11:30.060 Whatever, you gotta make an empty list and[br]then you call the append. 0:11:30.060,0:11:33.210 Remember that lists are mutable, so it's[br]okay to change it. 0:11:33.210,0:11:35.530 So we're saying, okay, we started with an[br]empty list. 0:11:35.530,0:11:38.210 Now append to the end of that, the word[br]book. 0:11:38.210,0:11:39.910 And then append to that, 99. 0:11:39.910,0:11:44.040 Wait a sec. 0:11:44.040,0:11:44.860 That's a mistake. 0:11:49.110,0:11:52.350 That's a mistake.[br]So I have to fix this mistake. 0:11:52.350,0:11:55.440 So watch me fix the mistake.[br]Poof. 0:11:57.830,0:12:00.680 Now my thing is magically fixed.[br]Isn't that amazing. 0:12:00.680,0:12:03.960 I have magic powers when it comes to slide[br]fixing. 0:12:03.960,0:12:07.370 I just snap my fingers and the slides are[br]fixed. 0:12:07.370,0:12:07.900 So here we go. 0:12:07.900,0:12:10.220 We append the 99, and we print it out. 0:12:10.220,0:12:13.920 And it's got book and 99, emphasizing the[br]fact that they don't 0:12:13.920,0:12:16.780 have to be the exact same kind of thing in[br]a list. 0:12:16.780,0:12:20.450 Then later we append cookie and then it's[br]book, 99, cookie. 0:12:20.450,0:12:22.910 Okay? So this append, we won't do it in line 0:12:22.910,0:12:25.730 like this so often, we'll tend to do it in[br]a loop as we're building up a 0:12:25.730,0:12:27.370 list, but that's the way you start with 0:12:27.370,0:12:30.630 an empty list and then [SOUND][br]programmatically grow it. 0:12:33.350,0:12:38.410 We can ask, much like we do in a string,[br]we can ask if an item is in a list. 0:12:38.410,0:12:41.280 So here is a list called some, with these[br]numbers in it. 0:12:41.280,0:12:42.910 It's got five numbers in it. 0:12:42.910,0:12:45.980 Is nine in some? True, yes it is. 0:12:45.980,0:12:48.780 Is 15 in some? False. 0:12:48.780,0:12:55.300 Is 20 not in, that's a leg, a legal[br]syntax, that is legal syntax. 0:12:55.300,0:12:58.280 Is 20 not in some, yes it's not there,[br]okay? 0:12:58.280,0:13:02.910 They don't modify the list, don't modify[br]the list, they're just asking questions. 0:13:02.910,0:13:06.260 These are logical operations often used in[br]if statements or 0:13:06.260,0:13:10.330 while, some kind of a logic that you might[br]be building. 0:13:12.050,0:13:14.990 Okay, so lists have order. 0:13:14.990,0:13:17.130 So when we were appending them, the first[br]thing went 0:13:17.130,0:13:20.730 in first, the second thing went in second,[br]et cetera, et cetera. 0:13:20.730,0:13:23.380 And we can also tell the list to sort[br]itself. 0:13:23.380,0:13:25.650 So one of the things that we can do with a[br]list, 0:13:25.650,0:13:28.780 now we're starting to see some power here,[br]is say, sort yourself. 0:13:28.780,0:13:30.186 This is a list of strings. 0:13:30.186,0:13:33.105 It can sort numbers, it can sort lots of[br]things. 0:13:33.105,0:13:38.550 friends.sort, that says hey there, dear[br]friends, sort yourself. 0:13:38.550,0:13:40.080 This makes a change. 0:13:42.540,0:13:44.670 It alters the list, and puts it, in 0:13:44.670,0:13:48.010 this case, in alphabetical order, Glenn,[br]Joseph, and Sally. 0:13:48.010,0:13:51.780 It is muted, it was, it's, it's been[br]modified, and so 0:13:51.780,0:13:54.660 friend sub one is now Joseph because[br]that's the second one. 0:13:54.660,0:13:55.850 Okay? 0:13:55.850,0:14:00.000 So the sort method says sort yourself now, 0:14:00.000,0:14:03.680 sort yourself, and it sorts and then[br]it stays sorted. 0:14:06.720,0:14:10.590 So [COUGH] 0:14:10.590,0:14:13.260 you're going to be kind of ticked about[br]this particular slide. 0:14:13.260,0:14:16.790 Because there's a whole bunch of built-in[br]functions that help with lists. 0:14:16.790,0:14:22.260 And, there's max, there's min, there's[br]len, various things. 0:14:22.260,0:14:24.520 And so we could, all those loops that I[br]told you how to 0:14:24.520,0:14:29.646 do, I was just showing you that stuff[br]because I thought it was important. 0:14:29.646,0:14:31.854 This the simplest way to go through and 0:14:31.854,0:14:35.230 find the largest, smallest, and sum,[br]et cetera. 0:14:35.230,0:14:36.860 So here's a list of numbers. 0:14:38.150,0:14:39.560 We can say how many are there. 0:14:39.560,0:14:43.060 That's the count.[br]We can say what's the largest, it's 74. 0:14:43.060,0:14:45.960 What's the smallest, that'd be 3. 0:14:45.960,0:14:49.080 What is the sum of the running total of[br]them all? 154. 0:14:49.080,0:14:52.310 If you remember from a few lectures[br]ago, these are the same numbers. 0:14:52.310,0:14:56.880 And what is the average, which is, sum of[br]them over the length of them, 0:14:56.880,0:14:58.120 Okay? 0:14:58.120,0:15:00.960 So this makes a lot more sense and if you[br]had a list of numbers 0:15:00.960,0:15:04.506 like this, you would simply say what's the[br]max, you wouldn't write a max loop. 0:15:04.506,0:15:06.945 I just did that to kind of demonstrate how[br]loops work. 0:15:06.945,0:15:09.590 [COUGH] Demonstrate how loops work. 0:15:09.590,0:15:12.360 So here is a way that you can sort 0:15:12.360,0:15:16.580 of change those kind of programs that we[br]wrote. 0:15:16.580,0:15:19.780 So there's two ways to write a summing[br]program. 0:15:19.780,0:15:22.100 Let's just say instead of the data being 0:15:22.100,0:15:26.370 in a list, we're going to write a while[br]loop that's going to read a 0:15:26.370,0:15:31.250 set of numbers until we say done, and then[br]compute the average of those numbers. 0:15:31.250,0:15:32.728 Okay, so let's say this is our problem. 0:15:32.728,0:15:38.220 Read a list of numbers, wait till the word[br]done comes in, and then average them. 0:15:38.220,0:15:40.450 So here's a little program that does that. 0:15:40.450,0:15:43.250 We create total equals zero, count equals[br]zero. 0:15:43.250,0:15:46.120 Make a infinite loop with while True. 0:15:46.120,0:15:47.520 And then we ask 0:15:47.520,0:15:48.810 to enter a number. 0:15:48.810,0:15:51.750 We get a string back from this, remember[br]raw_input always 0:15:51.750,0:15:56.790 gives us strings back, and then if it's[br]done, we're going to break. 0:15:56.790,0:15:59.770 This is the version of the if that does[br]not require an indent. 0:15:59.770,0:16:01.570 We just put the break up there. 0:16:01.570,0:16:04.080 And so that gets us out of the loop when[br]the time is right. 0:16:04.080,0:16:06.020 So when the time is right over here. 0:16:06.020,0:16:09.810 And then, we convert the value to float. 0:16:09.810,0:16:12.830 We use a float to convert the input to a[br]floating point number. 0:16:12.830,0:16:15.130 And then we do our accumulation pattern, 0:16:15.130,0:16:18.110 total equals total plus value, count equals[br]count plus one. 0:16:18.110,0:16:19.070 So this is going to run. 0:16:19.070,0:16:21.230 These numbers are going to go up and up[br]and up and up. 0:16:21.230,0:16:22.880 And then we're going to break out of it, 0:16:22.880,0:16:25.980 calculate the average, and then print the[br]average. 0:16:25.980,0:16:29.850 Because that's a floating point number, so now[br]the average is a floating point number. 0:16:29.850,0:16:31.070 So that's one way to do it. 0:16:31.070,0:16:31.390 Right? 0:16:31.390,0:16:34.570 That would be one way to write a program 0:16:34.570,0:16:37.990 that does an average, is keep a running[br]average 0:16:37.990,0:16:38.999 as you're reading the numbers. 0:16:40.060,0:16:44.080 But there's another way to do it, that[br]would exact, work exactly 0:16:44.080,0:16:47.508 the same way, and this is when you can[br]start using lists. 0:16:47.508,0:16:51.560 So you come in, you say I'm going to[br]make a list 0:16:51.560,0:16:56.810 of numbers, just a mnemonic name, numlist,[br]is an empty list. 0:16:56.810,0:17:02.070 Then I create another infinite loop[br]that's going to read for enter a number. 0:17:02.070,0:17:03.460 And if it's done, break. 0:17:03.460,0:17:08.650 That gets us out of it.[br]Convert the value to an int. 0:17:08.650,0:17:12.400 Convert the value to a float,[br]the input value to a float. 0:17:12.400,0:17:14.440 And then append it to the list. 0:17:14.440,0:17:16.579 So now the list is going to grow, each[br]time 0:17:16.579,0:17:18.819 we read a number the list is going to[br]grow. 0:17:18.819,0:17:21.420 However many times we add the number is 0:17:21.420,0:17:23.410 how many things are going to be in the[br]list. 0:17:23.410,0:17:25.730 So in this case, when we're at this point[br]and we 0:17:25.730,0:17:28.540 type done, there will be three numbers in[br]the list, because we 0:17:28.540,0:17:32.560 will have run append three times.[br]We'll have appended 3, 9, and 5. 0:17:32.560,0:17:37.160 We'll have them sitting in a list.[br]And we will have exited the loop. 0:17:37.160,0:17:39.360 So now you say, oh add up all the numbers[br]in 0:17:39.360,0:17:42.720 that list, and then divide it by the[br]length of the list. 0:17:42.720,0:17:43.960 And print the average. 0:17:43.960,0:17:47.290 So these two programs are basically[br]equivalent. 0:17:47.290,0:17:48.620 The only time that they might not be 0:17:48.620,0:17:54.120 equivalent was if there was ten million[br]numbers. 0:17:54.120,0:17:59.260 This would use up 40 megabytes of your[br]memory, which 0:17:59.260,0:18:01.230 is actually not a lot of memory on some[br]computers. 0:18:01.230,0:18:05.180 But if memory mattered, this does store[br]all those numbers. 0:18:05.180,0:18:07.680 This one actually just runs the[br]calculation. 0:18:07.680,0:18:11.660 So if there's a really large number of[br]numbers, this would make a difference, 0:18:11.660,0:18:15.660 because the list is growing and keeping[br]them all, summing them all at the end. 0:18:15.660,0:18:17.350 This is actually storing very little data. 0:18:18.430,0:18:20.600 But for reasonably sized numbers, 0:18:20.600,0:18:24.120 like thousands or even hundreds of thousands[br]of numbers, these 0:18:24.120,0:18:28.960 two approaches are kind of equivalent.[br]And then sometimes you actually 0:18:28.960,0:18:32.070 want to accumulate something a little more[br]complex than this, you want to 0:18:32.070,0:18:35.320 sort them or look for the maximum and look[br]for something else. 0:18:35.320,0:18:37.430 Who knows what, but the notion of make a 0:18:37.430,0:18:39.830 list and then append something to the list 0:18:39.830,0:18:42.380 each time through the iteration, and then do[br]something with 0:18:42.380,0:18:45.410 the list at the end is a rather powerful[br]pattern. 0:18:45.410,0:18:48.720 So this is also a powerful pattern,[br]this is accumulator 0:18:48.720,0:18:51.900 pattern where we just have the variables[br]accumulating in the loop. 0:18:51.900,0:18:55.040 This one is one where we accumulate the[br]data in 0:18:55.040,0:18:58.170 the loop and then do the computations all[br]at the end. 0:18:58.170,0:19:02.050 The, certain situations will make use of[br]these different techniques. 0:19:03.130,0:19:09.020 Okay.[br]So, connecting strings and lists. 0:19:09.020,0:19:11.830 So there's a method, a capability 0:19:11.830,0:19:16.190 of strings that is really powerful when it[br]comes to tearing data apart. 0:19:18.880,0:19:23.110 It's called the split.[br]So here is a string 0:19:23.110,0:19:26.858 with three words and it has blanks in between[br]here. 0:19:26.858,0:19:33.720 And abc.split says parse this string, 0:19:33.720,0:19:38.690 look for the blanks, break the string into[br]pieces, and give me back a 0:19:38.690,0:19:43.920 list with one item for each of the words[br]in the list as 0:19:43.920,0:19:47.200 defined by the spaces. Okay? 0:19:47.200,0:19:53.150 So, it takes, breaks it into three pieces[br]and gives us that back in a list. 0:19:53.150,0:19:55.870 This is very powerful. Okay? 0:19:55.870,0:19:58.340 So we're going to split it and we get back[br]a list. 0:19:58.340,0:20:04.180 There are three words, and the first word,[br]stuff sub zero, is With. 0:20:04.180,0:20:06.200 So there's a lot of parsing going on here. 0:20:06.200,0:20:09.180 We could do this with for loops and a lot[br]of other things. 0:20:09.180,0:20:11.240 There would be a lot of work in this[br]split. 0:20:11.240,0:20:14.180 Given that this is a really common task,[br]it's really 0:20:14.180,0:20:17.970 great that this has been put into Python[br]for us. 0:20:17.970,0:20:19.350 Okay? 0:20:19.350,0:20:22.850 So split breaks a string into parts and[br]produces a list of strings. 0:20:22.850,0:20:25.630 We think of these as words, we can access a 0:20:25.630,0:20:28.040 particular word or we can loop through all[br]the words. 0:20:28.040,0:20:31.050 So here we have stuff again and now we[br]have a, a for loop 0:20:32.050,0:20:35.070 for each of the, that's going to go[br]through each of the three words. 0:20:35.070,0:20:36.370 And then it's going to run three times. 0:20:36.370,0:20:37.410 Now chances are good we're going to do 0:20:37.410,0:20:39.600 something different other than just print[br]them out. 0:20:39.600,0:20:44.450 But you see how that you quickly can take[br]a split followed by a for, and then write 0:20:44.450,0:20:45.720 a loop that's going to go through each of[br]the 0:20:45.720,0:20:48.360 words, without working too hard to find[br]the spaces. 0:20:48.360,0:20:52.574 You let Python do all the hard work of[br]finding the spaces. 0:20:52.574,0:20:53.375 Okay? 0:20:53.375,0:20:56.350 So let's take a look at a couple of[br]samples. 0:20:58.130,0:21:00.480 Just a couple of things to teach you a[br]little more about split. 0:21:01.510,0:21:05.570 Split looks at many spaces as equal to one[br]space. 0:21:07.500,0:21:10.810 So, if you split a lot blank, blank, blank[br]of spaces, it's 0:21:10.810,0:21:14.480 still just throws away all the spaces and[br]gives us four words. 0:21:15.750,0:21:20.480 One, two, three, four and throws away[br]all the spaces, 0:21:20.480,0:21:21.900 because it assumes that's what we[br]want done. 0:21:21.900,0:21:22.535 So that's nice. 0:21:22.535,0:21:26.916 You can also have split, you can also have[br]split, 0:21:26.916,0:21:30.310 split on some other character. Sometimes[br]you'll be getting data 0:21:30.310,0:21:33.090 and they'll have used a semicolon, or a[br]comma, or 0:21:33.090,0:21:36.000 a colon, or a tab character, who knows[br]what they've 0:21:36.000,0:21:39.400 used, and your job is to dig that data[br]out. 0:21:39.400,0:21:42.900 So you can split, based on the different[br]character. 0:21:42.900,0:21:47.070 Here, if we're splitting normally with,[br]with this is a normal split. 0:21:47.070,0:21:49.800 It's not going to see the semicolons, it's[br]looking for a space. 0:21:49.800,0:21:52.880 And so all we get back is one 0:21:52.880,0:21:55.220 item in the string, with the semicolons. 0:21:55.220,0:21:58.520 But, if we switch, and we pass semicolon 0:21:58.520,0:22:01.080 as a parameter, in as as parameter to[br]split, 0:22:01.080,0:22:03.090 then it will know to split it based on 0:22:03.090,0:22:06.450 semicolons, and gives us first, second, and[br]third back. 0:22:07.520,0:22:07.820 Okay? 0:22:07.820,0:22:09.940 And then it gives us three words. 0:22:09.940,0:22:13.640 So you can split either on spaces, or you 0:22:13.640,0:22:17.490 can split on a character other than a[br]space. 0:22:17.490,0:22:18.040 Okay? 0:22:18.040,0:22:20.400 [COUGH] 0:22:20.400,0:22:25.230 So, let's take a look at how we might turn[br]this into some of our common assignments 0:22:25.230,0:22:32.420 that we have in this chapter, where we're[br]going to read some of the mailbox data. Okay? 0:22:33.420,0:22:36.720 So, here we go with a little program. 0:22:36.720,0:22:41.170 First three lines, we write these a lot.[br]Open the file. 0:22:41.170,0:22:43.090 Write a for loop to loop through each 0:22:43.090,0:22:44.870 line in the file. 0:22:44.870,0:22:48.100 Then we're going to strip off the white[br]space at the end of the line. 0:22:48.100,0:22:50.990 One, two, three.[br]Do those all the time. 0:22:50.990,0:22:54.990 And we're looking for lines, if you look[br]at the whole file, 0:22:54.990,0:22:58.170 we're looking for lines that start with[br]from, followed by a space. 0:22:58.170,0:23:00.420 So if the line does not start with from 0:23:00.420,0:23:03.700 followed by a space, that's a space right[br]there, continue. 0:23:03.700,0:23:08.460 So that's a way to skip all the lines that[br]don't look like this. 0:23:08.460,0:23:12.490 There're thousands of lines in this file[br]and just a few that look like this. Okay? 0:23:12.490,0:23:17.110 So we're going to look and we're[br]going to try 0:23:17.110,0:23:22.790 to find what day of the week this thing[br]happened on. 0:23:22.790,0:23:27.700 So, so we're throwing away all the lines[br]with this little bit of code. 0:23:27.700,0:23:32.820 Then what we do is we take the line, which[br]is all of this text, and then we split it. 0:23:34.110,0:23:38.270 And we know that the day of the week is[br]words sub two. 0:23:38.270,0:23:43.080 So this is words sub zero, this is words sub[br]one, and this is words sub two. 0:23:43.080,0:23:46.480 So this is words sub zero, sub one, and sub[br]two. 0:23:46.480,0:23:48.550 And so, all we have to do is print out the[br]sub two 0:23:48.550,0:23:53.740 and we get, we throw away all the lines[br]except the from lines. 0:23:53.740,0:23:56.650 We split them and take the sec, uh, the, 0:23:56.650,0:23:59.330 the third word or words sub two and we 0:23:59.330,0:24:02.260 can quickly quickly create something[br]that's extracting 0:24:02.260,0:24:04.060 the day of the week out of these. 0:24:06.030,0:24:07.400 Okay? 0:24:07.400,0:24:11.890 So it's, it's, I mean, it's quick, because[br]split does the tricky work. 0:24:11.890,0:24:15.140 If you go back to the strings chapter, you[br]saw that 0:24:15.140,0:24:16.910 we did a lot of work to get this to[br]happen. 0:24:17.950,0:24:21.040 So here's even another tricky pattern. 0:24:21.040,0:24:26.510 So let's say we want to do what we did at[br]the end of Chapter Six, 0:24:26.510,0:24:28.120 the string chapter. 0:24:28.120,0:24:30.870 Let's say we wanted to get back this little[br]bit of data. 0:24:32.130,0:24:33.330 Okay? 0:24:33.330,0:24:37.310 So, can look at this and say, okay, let's[br]split this. 0:24:37.310,0:24:42.420 And this will be zero, one, and two, and[br]three, and four, and five, and six. 0:24:42.420,0:24:44.530 We're splitting it based on spaces. 0:24:44.530,0:24:50.106 Then the email address is words sub one,[br]right? 0:24:51.106,0:24:54.666 So that email address is this little bit[br]of stuff 0:24:54.666,0:24:58.780 because it's in between spaces, right?[br]So that's what we pull out. 0:24:58.780,0:25:02.355 The email address is words sub one. 0:25:02.355,0:25:04.512 We've got that. 0:25:04.512,0:25:07.730 So that's sitting in this email address[br]variable. 0:25:07.730,0:25:10.000 Then we really, all we want, we don't[br]really want the whole thing, 0:25:10.000,0:25:11.960 we just want the part after the 0:25:11.960,0:25:14.470 at sign, and we can do a lookup for the, oop. 0:25:14.470,0:25:16.290 We can do a lookup of the at sign. 0:25:17.490,0:25:22.145 But you can also then do a second, come[br]back, come back. 0:25:22.145,0:25:25.300 [SOUND] There we come. 0:25:25.300,0:25:29.110 You can also do a second split.[br]Okay? 0:25:29.110,0:25:31.260 So we're taking this variable here, email, 0:25:31.260,0:25:33.980 which is merely this little part right[br]here. 0:25:33.980,0:25:36.840 And we are splitting it again, except this 0:25:36.840,0:25:38.400 time we're splitting it based on a at[br]sign. 0:25:38.400,0:25:42.640 Which means it's going to bust it right[br]here, and find 0:25:42.640,0:25:44.140 us two pieces. 0:25:44.140,0:25:49.730 So pieces now is a list where the sub zero[br]item is the 0:25:49.730,0:25:56.280 person's name and sub one item is the host[br]that their mail address is held from. 0:25:56.280,0:26:00.540 Okay?[br]And so then all we need to know is pieces 0:26:00.540,0:26:06.380 is sub one, and pieces sub one is this[br]guy right here. 0:26:07.900,0:26:10.750 So that's pieces sub one, and so we[br]pulled it out. 0:26:10.750,0:26:13.470 So if you go back to how we did it before,[br]we were 0:26:13.470,0:26:17.100 doing searching, we were searching some[br]more, and then we were taking slices. 0:26:17.100,0:26:19.380 This is a little more elegant, okay? 0:26:19.380,0:26:21.110 Because really, we split it and then we[br]split it, 0:26:21.110,0:26:23.080 and we knew what piece we were looking at. 0:26:23.080,0:26:27.250 So this is what I call the Double Split[br]Pattern, where you split a string 0:26:27.250,0:26:30.630 into a list, then you take a thing out,[br]and then you split it again. 0:26:31.710,0:26:33.020 Depending on what data you're looking for. 0:26:33.020,0:26:35.376 This is just a technique, it's not the[br]only technique. 0:26:35.376,0:26:40.480 Okay, so that's lists. 0:26:40.480,0:26:42.040 We talked about the concept of a 0:26:42.040,0:26:44.540 collection where lists have multiple[br]things in it. 0:26:44.540,0:26:47.350 Definite loops, again, we've seen these[br]things. 0:26:47.350,0:26:49.600 We're kind of, it looks a lot like strings 0:26:49.600,0:26:53.100 except the elements are more powerful and[br]they're more mutable. 0:26:53.100,0:26:59.070 We still use the bracket operator and we[br]redid the max, min, and sum. 0:26:59.070,0:27:02.382 Except we did it in, like, one line rather[br]than a whole loop. 0:27:02.382,0:27:06.110 And something we're going to play with a[br]lot is using split to parse strings, 0:27:06.110,0:27:08.630 the single split, and then the double[br]split 0:27:08.630,0:27:11.130 is the natural extension of the single[br]split. 0:27:11.130,0:27:14.780 So, see you in the next lecture, looking[br]forward to talking about dictionaries.