0:00:00.140,0:00:05.070 Hello again, and welcome to Chapter Nine[br]of Python, Dictionaries. 0:00:05.070,0:00:09.210 As always, this lecture is copyright[br]Creative Commons Attribution. 0:00:09.210,0:00:14.070 That means the audio, the video, the[br]slides, and even my scribbles. 0:00:14.070,0:00:17.860 You can use them any way you like, as long[br]as you attribute them. 0:00:17.860,0:00:20.150 Okay, so this is the second chapter 0:00:20.150,0:00:22.060 where we're talking about collections, and[br]the collections 0:00:22.060,0:00:25.730 are like a piece of luggage in that you[br]can put multiple things in them. 0:00:27.910,0:00:30.340 Variables that we've talked about sort of[br]starting in 0:00:30.340,0:00:35.070 Chapter Two and Chapter Three were simple[br]variables, scalar. 0:00:35.070,0:00:37.260 They're just kind of one thing, and as[br]soon as you, 0:00:37.260,0:00:40.610 like, put another thing in there, it[br]overwrites the first thing. 0:00:40.610,0:00:46.035 And so if you look at the code, you know,[br]x = 2 and x = 4, 0:00:46.035,0:00:50.870 the question is, you know, where did[br]the 2 go? 0:00:50.870,0:00:53.180 Right? The 2 was there, x was there, 0:00:53.180,0:00:56.710 there was a 2 in there, and then we cross[br]it out and put 4 in there. 0:00:56.710,0:01:01.140 This is sort of the basic operation, the[br]assignment statement, it's a replacement. 0:01:01.140,0:01:03.770 But a dictionary allows us to put more[br]than one thing. 0:01:03.770,0:01:06.220 Not using this syntax, but it allows us to 0:01:06.220,0:01:09.390 have a variable that's really an aggregate[br]of many values. 0:01:09.390,0:01:12.430 And the difference between a list and a[br]dictionary 0:01:12.430,0:01:15.510 is how the values are structured within[br]that single variable. 0:01:15.510,0:01:17.820 The list is a linear collection, 0:01:17.820,0:01:21.010 indexed by integers 0, 1, 2, 3. 0:01:21.010,0:01:24.390 If there's five of them, it's 0 through 4,[br]very much like a 0:01:24.390,0:01:28.080 Pringle's can here, where they're just[br]stacked nicely on top of each other. 0:01:28.080,0:01:32.690 Everything's kind of organized. We talked[br]about it in the last, in the last lecture. 0:01:32.690,0:01:35.640 This lecture we're talking about dictionaries. 0:01:35.640,0:01:37.640 A dictionary's very powerful. 0:01:37.640,0:01:42.170 It's, and its power comes from a different[br]way of organizing itself internally. 0:01:42.170,0:01:43.850 It's a bag of values, 0:01:43.850,0:01:47.670 like a just sort of, just stuff's[br]in it, it's not in any order. 0:01:47.670,0:01:49.190 Big stuff, little stuff. 0:01:49.190,0:01:50.650 Things have labels. 0:01:50.650,0:01:52.440 You can also think of it like a purse with 0:01:52.440,0:01:55.480 just things in it that's like, it's not[br]like stacked. 0:01:55.480,0:01:57.590 It's just, stuff moves around as you're going 0:01:57.590,0:02:00.580 and that's, that's a very good model for[br]dictionaries. 0:02:01.590,0:02:03.080 And so dictionaries 0:02:03.080,0:02:05.890 have to have a label because the stuff is[br]not in order. 0:02:05.890,0:02:07.890 There's no such thing as the third thing. 0:02:07.890,0:02:09.590 There is the thing with the label perfume. 0:02:09.590,0:02:11.110 There's the thing with the label candy. 0:02:11.110,0:02:14.180 There's the thing with the label money. 0:02:14.180,0:02:17.100 And so there's the value, the thing, the money. 0:02:17.100,0:02:19.290 And then there's always also the label. 0:02:19.290,0:02:22.810 We also call these key/value. 0:02:24.820,0:02:28.970 The key is the label and the value is[br]whatever. 0:02:28.970,0:02:31.220 And so these pink things are all labels for 0:02:31.220,0:02:33.240 various things you could put in your purse. 0:02:33.240,0:02:36.053 So you could say to your purse, "hey purse,[br]give me my tissues." 0:02:36.053,0:02:38.500 "Hey purse, give me my money." 0:02:38.500,0:02:40.430 And it, it's in there somewhere and the[br]purse sort of 0:02:40.430,0:02:43.428 gives you back the tissues or the money. 0:02:43.428,0:02:48.980 And it's, Python's most powerful data[br]collection is the dictionaries. 0:02:48.980,0:02:50.190 And it's when 0:02:50.190,0:02:52.280 you get used to wielding them you'll say,[br]like, 0:02:52.280,0:02:54.130 whoa, I can do so much with these things. 0:02:54.130,0:02:55.980 And at the beginning you just sort of 0:02:55.980,0:02:59.600 learning sort of how to use them without[br]hurting yourself. 0:02:59.600,0:03:00.780 But they're very powerful. 0:03:00.780,0:03:01.840 It's like a database. 0:03:01.840,0:03:06.940 It's, it allows you to store very arbitrary[br]data organized in however you feel like 0:03:06.940,0:03:11.010 organizing it, in a way that advances the[br]cause of the program that you're writing. 0:03:11.010,0:03:15.940 And we're still kind of at the very[br]beginning, but as you learn more, 0:03:15.940,0:03:17.880 these will become a very powerful[br]tool for you. 0:03:19.920,0:03:23.130 They, dictionaries have different names in[br]different languages. 0:03:24.680,0:03:27.340 PERL or PHP would call them associative[br]arrays. 0:03:28.900,0:03:32.000 Java would call them a PropertyMap or a[br]HashMap. 0:03:32.000,0:03:35.390 And C# might call them a property bag or[br]an attribute bag. 0:03:35.390,0:03:38.030 And so they're, they're just the same[br]concept. 0:03:38.030,0:03:42.300 It's keys and values is the concept that's[br]across all these languages. 0:03:42.300,0:03:43.560 Just are very powerful. 0:03:43.560,0:03:44.950 And if you look at the Wikipedia entry 0:03:44.950,0:03:45.960 that I have here 0:03:45.960,0:03:48.620 you can see that it's just, it's a concept 0:03:48.620,0:03:52.670 that we give different names in different[br]languages. Same concept, different names. 0:03:53.900,0:03:58.196 So like I said, the difference between a[br]list and a dictionary, they both can store 0:03:58.196,0:04:00.910 multiple values. The question is how we[br]label them, 0:04:00.910,0:04:03.280 how we store them, and how we retrieve[br]them. 0:04:03.280,0:04:07.430 So here's an example use of a dictionary.[br]I'm going to make a thing called purse. 0:04:07.430,0:04:10.750 And I'm going to store in purse, this is[br]an assignment statement, 0:04:10.750,0:04:14.000 purse sub money.[br]So this isn't like sub zero. 0:04:14.000,0:04:14.960 This is sub money. 0:04:14.960,0:04:18.220 So I'm actually using a string as the[br]place. 0:04:18.220,0:04:21.050 And, so I'm going to say stick 12[br]in my purse 0:04:21.050,0:04:24.100 and stick a Post-it note that says[br]that's my money. 0:04:24.100,0:04:26.310 Candy is 3. Tissues is 75. 0:04:26.310,0:04:31.590 And if I look at that, it's not just the[br]numbers 12, 3, and 75 as it 0:04:31.590,0:04:36.650 would be in a list. It is the connection[br]between money and 12, 0:04:36.650,0:04:41.550 tissues is 75, candy is 3.[br]And in the key/value, that's the 0:04:41.550,0:04:47.470 key and that's the value.[br]So candy is the key and 3 is the value. 0:04:47.470,0:04:51.840 Now I can look things up by their name,[br]print purse sub candy. 0:04:51.840,0:04:56.770 Well it goes and finds it, asking hey purse,[br]give me back candy, and it 0:04:56.770,0:05:00.220 goes and finds the value, which is 3, and[br]so out comes a 3. 0:05:00.220,0:05:02.810 We can also put it 0:05:02.810,0:05:05.560 on the right-hand side of an[br]assignment statement, 0:05:05.560,0:05:07.270 so purse sub candy says give me[br]the old version of candy, 0:05:07.270,0:05:10.040 and then add 2 to it, which 0:05:10.040,0:05:14.180 gives me 5, and then store it back[br]in that purse 0:05:14.180,0:05:15.930 under the label candy. 0:05:15.930,0:05:19.260 So we see candy changing to 5. 0:05:19.260,0:05:21.410 And so, this is a place, and you could 0:05:21.410,0:05:23.280 do this with a list except these would be[br]numbers. 0:05:23.280,0:05:27.970 You could say purse sub two is equal to[br]purse sub two plus two, or whatever. 0:05:27.970,0:05:31.500 But in dictionaries, there are labels. 0:05:31.500,0:05:32.940 Now, they're not strings. 0:05:32.940,0:05:35.280 Strings is a very common label in[br]dictionaries, but 0:05:35.280,0:05:37.530 it's not always strings, you can use other[br]things. 0:05:37.530,0:05:39.950 In this chapter we'll pretty much focus on[br]strings. 0:05:39.950,0:05:43.910 You can even use numbers and then you[br]would get a little confused. 0:05:43.910,0:05:44.940 But you can. 0:05:44.940,0:05:48.130 So here's sort of a picture of how this[br]works. 0:05:48.130,0:05:52.570 So, if we take a look at this line purse[br]sub money equals 12, 0:05:52.570,0:05:57.670 it's like we were putting a key/value[br]connection, money is the label for 12. 0:05:57.670,0:06:00.730 And then we sort of move that in. 0:06:00.730,0:06:04.340 And it's up to the purse to decide[br]where things live. 0:06:04.340,0:06:09.910 If we look at the next line, we're going to[br]put the value in with a 0:06:09.910,0:06:11.790 3 in with the label candy, and we're[br]going to put 0:06:11.790,0:06:14.530 the value 75 in with the label of tissues. 0:06:14.530,0:06:17.610 And when we say hey purse, print yourself[br]out, it just 0:06:17.610,0:06:21.060 goes and pulls these things back out and[br]hands them to us. 0:06:21.060,0:06:24.690 And what it's really, it's giving us both the[br]label and the value and it's necessary 0:06:24.690,0:06:26.320 to do that cause they're just like 12, 0:06:26.320,0:06:28.990 75, and 3. What exactly is that? 0:06:28.990,0:06:31.440 And so this syntax with the curly braces 0:06:31.440,0:06:34.860 is what happens when you print a[br]dictionary out. 0:06:34.860,0:06:39.360 The same thing happens when we're sort of[br]printing purse sub candy, right? 0:06:39.360,0:06:40.300 Purse sub candy, 0:06:42.380,0:06:45.240 it's like dear purse, go and find the candy[br]thing. 0:06:45.240,0:06:46.320 Look at that one, look at that one. 0:06:46.320,0:06:48.330 Oh, yep, yep, this is candy. 0:06:48.330,0:06:50.190 But what we're looking for is the value, 0:06:50.190,0:06:52.620 and so that's why 3 is coming out here. 0:06:52.620,0:06:57.250 So go look up under candy, and tell me[br]what's stored under candy. 0:06:57.250,0:06:58.930 These can be actually more complex things, 0:06:58.930,0:07:00.560 I'm just keeping it simple for this[br]lecture. 0:07:02.900,0:07:07.570 And then, when we say purse sub candy[br]equals purse sub candy plus 2, well it 0:07:07.570,0:07:14.220 pulls the 3 out, looking at the label[br]candy, then adds 3 plus 2 and makes 5, 0:07:14.220,0:07:20.030 and then it assigns it back in, and then[br]that says, oh, go, go place this number 5 0:07:20.030,0:07:26.035 in the purse with the label of candy,[br]which then replaces the 3 with a 5. 0:07:26.035,0:07:26.630 Okay? 0:07:28.280,0:07:30.080 And if we print it out, we see that the 0:07:30.080,0:07:34.990 new variable, or the new candy entry,[br]is now 5. 0:07:34.990,0:07:35.590 Okay? 0:07:36.880,0:07:40.930 So if we just sort of put these things[br]side by side, we create 0:07:40.930,0:07:43.860 them sort of both the same way and we make[br]an empty list, and an empty 0:07:43.860,0:07:46.880 dictionary, we call the append method[br]because 0:07:46.880,0:07:48.660 we're sort of just putting these things in 0:07:48.660,0:07:52.142 order. You gotta put the first one in[br]first. So it's not telling you where. 0:07:52.142,0:07:53.467 You kind of know that this 0:07:53.467,0:07:55.117 will be the first one, cause we're[br]starting with an empty one, 0:07:55.117,0:07:56.552 and this will be the second one. 0:07:56.552,0:08:02.002 We put in the values 21 and 183, and then[br]we print it out, and it's like okay, you gave 0:08:02.002,0:08:04.437 me the values 21 and 183, I will maintain[br]the order for you, 0:08:04.437,0:08:07.617 there's no keys other than their position. 0:08:07.617,0:08:12.437 The position is the key, as it were, so if[br]I want to to change the first one to 23, 0:08:12.437,0:08:17.415 well, I say list sub zero, which is this,[br]and then change that to 23. 0:08:17.415,0:08:19.546 So this is sort of used as a lookup to 0:08:19.546,0:08:22.573 find something. It can be used on either the[br]right-hand side or the 0:08:22.573,0:08:24.728 left-hand side of an assignment statement. 0:08:24.728,0:08:27.691 Comparing that to dictionaries, I want to[br]put a 21 in there 0:08:27.691,0:08:30.078 and I want to put it with the label age. 0:08:30.078,0:08:33.001 I'm going to put 182, put that in with the[br]label course. 0:08:33.001,0:08:36.787 So we don't have to like, make an entry. 0:08:36.787,0:08:38.317 The fact that the entry doesn't exist, 0:08:38.317,0:08:41.712 it creates the age entry and sticks 21 into it, 0:08:41.712,0:08:44.152 creates the course entry, sticks 182 into it. 0:08:44.152,0:08:48.572 We print it out and it says, oh, course[br]is 182 and age is 21. 0:08:48.572,0:08:55.062 This emphasizes that order is not[br]preserved in dictionaries. 0:08:56.062,0:08:58.478 I won't go into like great detail as to[br]why that is. 0:08:58.478,0:09:01.233 It turns out that that's a compromise that 0:09:01.233,0:09:04.524 makes them fast using a technique called[br]hashing. 0:09:04.524,0:09:08.887 It's how it actually works internally,[br]go Wikipedia hashing and 0:09:08.887,0:09:09.717 take a look. 0:09:09.717,0:09:13.740 But, the thing that matters to us as[br]programmers primarily 0:09:13.740,0:09:19.537 is that lists maintain order and[br]dictionaries do not maintain order. 0:09:19.537,0:09:23.992 They, dictionaries give us power[br]that we don't have in lists. 0:09:23.992,0:09:25.792 I mean they're very complimentary. 0:09:25.792,0:09:27.622 Now there's not this one that's better[br]than the other. 0:09:27.622,0:09:29.097 They've very complimentary. 0:09:29.097,0:09:31.987 Different kinds of data is either better[br]represented as a list 0:09:31.987,0:09:33.202 or as a dictionary, depending on the 0:09:33.202,0:09:34.717 problem you're trying to solve. 0:09:34.717,0:09:38.997 And in a moment we'll, we'll be writing[br]programs that are using both. 0:09:38.997,0:09:40.998 So if we come down here and I say, 0:09:40.998,0:09:46.963 okay, stick 23 into, assignment statement,[br]into ddd sub age, 0:09:46.963,0:09:50.958 well that will change this 21 to 23,[br]so when we print it out. 0:09:50.958,0:09:53.311 So you can, this part, where you look[br]something up and 0:09:53.311,0:09:55.689 change the value, you can do either way. 0:09:55.689,0:09:57.922 It's just how you do it here 0:09:57.922,0:10:00.066 is a little bit different, okay? 0:10:00.066,0:10:03.570 So let's look through this code again. 0:10:03.570,0:10:06.825 And so I like, I like to use the word key[br]and value. 0:10:06.825,0:10:09.404 Key is the way we look the thing up,[br]and in lists 0:10:09.404,0:10:13.016 keys are numbers starting at[br]zero and with no gaps. 0:10:13.016,0:10:15.024 In dictionaries keys are whatever we want[br]them to be, 0:10:15.024,0:10:17.662 in this case I'm using strings. 0:10:17.662,0:10:21.187 And then the value is the number we're[br]storing in it. 0:10:21.187,0:10:25.137 So we create this kind of a list with that[br]kind, those 0:10:25.137,0:10:26.187 kinds of statements. 0:10:26.187,0:10:29.187 This statement creates this kind of a thing. 0:10:29.187,0:10:33.687 Now, if we, if we think of this assignment[br]statement as moving data 0:10:33.687,0:10:37.475 into a new, into a place, a new item of[br]data into a place. 0:10:41.440,0:10:43.280 It's looking at this thing right here. 0:10:43.280,0:10:45.330 Right? It's like, that's where I want to[br]move it. 0:10:45.330,0:10:48.370 And so it hunts, and says, look the key up. 0:10:48.370,0:10:49.710 And that's the one that I'm going to change. 0:10:49.710,0:10:52.300 And then once it knows which it's going to[br]change, 0:10:52.300,0:10:57.230 then it's going to take the 23, and it's[br]going to put the 23 into that location. 0:10:57.230,0:11:01.300 And so that's how this changes from that[br]to that. 0:11:01.300,0:11:06.550 Similarly when we get down to here, we're[br]going to stick 23 somewhere and 0:11:06.550,0:11:10.120 this is, this expression, this lookup[br]expression, the index 0:11:10.120,0:11:13.410 expression ddd sub age, is where we're[br]going to put it. 0:11:13.410,0:11:16.340 So, we're looking here, where is that thing? 0:11:16.340,0:11:19.900 Well, that thing is this entry 0:11:19.900,0:11:23.120 in the dictionary. And so now when we're[br]going to store the 23, 0:11:23.120,0:11:24.380 we know where the 23 is going to go. 0:11:24.380,0:11:27.240 It's going to overwrite the 21 and so the[br]21 is 0:11:27.240,0:11:31.440 going to change to 23, okay? So they're[br]kind of similar. 0:11:31.440,0:11:34.340 There are things that work similar in them 0:11:34.340,0:11:36.170 and then there are things that work[br]differently in them. 0:11:37.550,0:11:41.000 We can make literals, constants, with 0:11:41.000,0:11:43.400 curly braces. And they look just like the print. 0:11:43.400,0:11:44.760 That's one nice thing about Python. 0:11:44.760,0:11:48.880 When you print something out it's showing[br]you how you can make a literal, and 0:11:48.880,0:11:56.120 basically you just open with a curly brace[br]and say chuck colon 1, fred 42, jan 100. 0:11:56.120,0:11:57.580 And we're making connections. 0:11:58.200,0:12:02.000 key/value pair, key/value pair.[br]We print it out and 0:12:04.560,0:12:06.270 No order. They don't maintain order. 0:12:06.270,0:12:08.760 Now they might come out in the same order,[br]but that's just lucky. 0:12:08.760,0:12:09.180 Right? 0:12:09.180,0:12:10.550 All the ones I've shown you so far don't 0:12:10.550,0:12:12.650 come out in the same order, which is good[br]to demonstrate it. 0:12:12.650,0:12:16.000 If it one time came out in the same order[br]that wouldn't be broken. 0:12:16.000,0:12:18.500 It's not like it doesn't want to come out[br]in the same order. 0:12:18.500,0:12:22.090 It's just, you don't, it's not internally[br]stored, and you 0:12:22.090,0:12:23.859 add an element and it may reorder them. 0:12:25.110,0:12:28.030 You can do an empty dictionary with just a[br]curly brace, curly brace. 0:12:33.330,0:12:37.400 So, I'm going give you another example. 0:12:37.400,0:12:40.120 And I'm going to show you a series of[br]names. 0:12:40.120,0:12:45.810 And I want you to figure out what the most[br]common name is 0:12:45.810,0:12:48.240 and how many times each name appears. 0:12:48.240,0:12:51.726 Now these are real people.[br]They actually work on the Sakai project. 0:12:51.726,0:12:58.540 Steven, Zhen, and Chen, and me.[br]So these are people that are actually 0:12:58.540,0:13:00.710 in the data that we use in this course. 0:13:00.710,0:13:04.450 Okay? And so I think I'll show you about[br]fifteen names 0:13:04.450,0:13:06.925 and you're to come up with a way, I'm[br]going to 0:13:06.925,0:13:11.270 show them to you one at a time, you need to[br]come up with a way to keep track of these. 0:13:11.270,0:13:12.390 Okay? 0:13:12.390,0:13:15.611 So I'll just, with no further ado I will show[br]you the names. 0:13:15.611,0:13:25.611 [BLANK_AUDIO] 0:13:53.752,0:13:57.510 Okay, so that's all the names.[br]Did you get it? 0:13:57.510,0:14:00.160 You might have to go back and do it again. 0:14:01.000,0:14:03.520 How did you solve the problem? 0:14:03.520,0:14:08.300 What kind of a data structure did you[br]build to solve the problem? 0:14:08.300,0:14:10.630 Or did you just say wow that's painful, I 0:14:10.630,0:14:14.890 think I will learn Python instead, in[br]solving that problem. 0:14:14.890,0:14:15.524 Okay? 0:14:15.524,0:14:19.880 So pause the, pause the video if you want and 0:14:19.880,0:14:23.250 write down or go back, write down what you[br]think the 0:14:23.250,0:14:28.070 number of the most common name is and how[br]many times. 0:14:30.200,0:14:32.080 Okay. Now I'll show you. 0:14:32.080,0:14:35.180 So here is the whole list.[br]It's all of them. 0:14:35.180,0:14:38.730 And now that we see all of them, we[br]use our amazing human 0:14:38.730,0:14:42.720 mind and we scan around, and look at[br]purpleness and, and all that stuff. 0:14:42.720,0:14:44.320 And then we go like, oh, this is a so 0:14:44.320,0:14:46.190 much easier problem when I'm looking[br]at the whole thing. 0:14:47.990,0:14:51.590 And I think that the most common person is[br]Zhen, and 0:14:54.310,0:14:58.770 I think we see Zhen, I think we see Zhen[br]five times. 0:15:00.760,0:15:06.550 And I think csev is one, two, three and[br]Chen Wen is one, two. 0:15:06.550,0:15:08.980 And Steve Marquard is one, two, three. 0:15:08.980,0:15:12.530 So the question is, what is an effective[br]data structure if you going to see 0:15:12.530,0:15:15.510 a million of these, what kind of data[br]structure would you have to produce? 0:15:15.510,0:15:16.720 Because you can't keep it in you head 0:15:16.720,0:15:19.510 even, even this number of people, you can't 0:15:19.510,0:15:22.400 even this amount of data, no way you can[br]keep it in your head. You have to come 0:15:22.400,0:15:24.970 up with some kind of a variable, as it were, 0:15:24.970,0:15:28.230 just like largest so far was the variable. 0:15:28.230,0:15:29.800 Some kind of variable that gets you to 0:15:29.800,0:15:31.450 the point where you understand what's[br]going on. 0:15:31.450,0:15:35.080 And so this is the most common technique[br]to solve this 0:15:35.080,0:15:39.040 problem where you keep a running total of[br]each of the names. 0:15:39.040,0:15:42.500 And if you see a new name, you add them to[br]the list. 0:15:42.500,0:15:45.090 So csev and then you give him a one, 0:15:45.090,0:15:47.410 and then you see Zhen and you give her a[br]one, 0:15:47.410,0:15:49.620 and then you see Chen and you give her a[br]one. 0:15:49.620,0:15:51.670 And then you see csev again and you give[br]him a two. 0:15:51.670,0:15:54.825 And you see a two, and a two, and a one[br]right? 0:15:54.825,0:15:57.050 [COUGH] 0:15:57.050,0:16:02.760 And so then when you're all done you have[br]the mapping, right, of these things 0:16:02.760,0:16:06.100 and you go oh, okay, let me look through[br]here and find the largest one. 0:16:06.100,0:16:09.960 That's the largest one and so that must be[br]the person who is the most. 0:16:09.960,0:16:12.170 So you need a scratch area, 0:16:12.170,0:16:14.710 a data structure or a piece of paper as[br]it were, 0:16:14.710,0:16:19.030 and so that's what, exactly what[br]dictionaries are really good at. 0:16:19.030,0:16:23.910 You could think of this as like a[br]histogram. You know, it's, 0:16:23.910,0:16:27.840 it's a bunch of counters, but counters[br]that are indexed by a string. 0:16:27.840,0:16:29.450 So we use a lot of this. 0:16:29.450,0:16:34.130 And so this is a pattern of many counters[br]with a dictionary, simultaneous counters. 0:16:34.130,0:16:35.390 We're counting a bunch of, we're looking 0:16:35.390,0:16:39.430 at a series of things, and we're going to[br]simultaneously keep track 0:16:39.430,0:16:42.530 of a large number of counters, rather than[br]just one counter. 0:16:42.530,0:16:46.950 How many names did you see total? Whatever,[br]12. But how many of each name 0:16:46.950,0:16:50.480 did you see is a bunch of counters, so[br]it's a bunch of simultaneous counters. 0:16:51.850,0:16:56.890 So a dictionary is, is great for this,[br]a dictionary is great for this. 0:16:56.890,0:16:58.520 We, when we see somebody for the first 0:16:58.520,0:17:00.440 time, we can add an entry to the[br]dictionary, 0:17:00.440,0:17:03.940 which is kind of like going oh,[br]csev one, 0:17:03.940,0:17:07.970 and then Chen Wen one. Now these don't[br]exist yet. 0:17:07.970,0:17:10.480 Right? So we've got csev one and Chen[br]Wen one, so 0:17:10.480,0:17:13.359 that creates an entry and sticks a one in[br]it and the 0:17:13.359,0:17:17.119 mapping between the key csev and the value[br]one, the key Chen Wen 0:17:17.119,0:17:19.740 and the value one and then we say, hey[br]what's in there? 0:17:19.740,0:17:22.740 Oh, we've got a csev is one and[br]Chen Wen is one. 0:17:22.740,0:17:25.550 And then we see Chen Wen a second time, 0:17:25.550,0:17:27.450 so we'd add another number right there. 0:17:27.450,0:17:30.690 So this old number is one, we add one to[br]it and we get 0:17:30.690,0:17:35.370 two and then we stick that back in and[br]then we do the calculations. 0:17:35.370,0:17:39.100 We do a dump and say oh there's two in[br]Chen Wen and one in csev. 0:17:40.130,0:17:40.630 Okay? 0:17:41.630,0:17:46.300 So this is a great data structure for the[br]simutaneous counters like what's 0:17:46.300,0:17:49.940 the most common word, who had the most[br]commits, da, da, da, da, da. 0:17:51.090,0:17:54.220 Now, everything we do we have to figure[br]out 0:17:54.220,0:17:55.990 like, when you're going to get in trouble[br]with Python. 0:17:55.990,0:18:00.250 When Python's going to give you the old[br]thumbs down and say oh, you went too far. 0:18:00.250,0:18:06.360 So one thing Python does not like is if[br]you reference a key before it exists. 0:18:06.360,0:18:09.900 We'll, we'll talk in a second how to[br]work around this. But if you simply 0:18:09.900,0:18:11.600 create a dictionary and say, oh, print out 0:18:11.600,0:18:15.090 what's under csev, it gives you a[br]traceback. 0:18:15.090,0:18:15.710 It's like, 0:18:15.710,0:18:17.940 I'm going to inform you that that's not[br]there. 0:18:17.940,0:18:20.490 And it says key error, csev. 0:18:20.490,0:18:24.810 Now, the thing that allows us to solve[br]this is the in operator. 0:18:24.810,0:18:28.140 We've used the in operator to see if a[br]substring was in a string. 0:18:28.140,0:18:30.120 Or if a number was in a list. 0:18:30.120,0:18:37.090 So, so this in operator says, in operator[br]says, hey, ask a question. 0:18:37.090,0:18:42.140 Is the string csev a current key in the[br]dictionary ccc? 0:18:43.210,0:18:46.460 Is the string csev a current key in the[br]dictionary ccc? 0:18:46.460,0:18:47.750 And it says, False. 0:18:49.090,0:18:52.240 So now we have something that doesn't give[br]a traceback 0:18:52.240,0:18:55.290 that can tell us whether or not the key is[br]there. 0:18:55.290,0:18:57.480 So if you remember the algorithm, the[br]first time you see it, you 0:18:57.480,0:19:01.270 set them to one, and every other time, you[br]add one to them. 0:19:02.520,0:19:04.030 So this is how we do that in Python. 0:19:05.150,0:19:08.220 So here's how we implement that program[br]that I just gave you 0:19:08.220,0:19:12.080 in Python. So, here's our names. 0:19:12.080,0:19:14.760 It's shorter so my slide works better. 0:19:14.760,0:19:17.470 Here's a variable, our iteration variable,[br]it's going to, you know, 0:19:17.470,0:19:20.570 go through all five of these one at a time. 0:19:20.570,0:19:24.553 And within the body of the[br]loop we have an if statement. 0:19:24.553,0:19:26.793 If the name is not currently in the 0:19:26.793,0:19:30.929 counts dictionary, counts is the name of[br]my dictionary. 0:19:30.929,0:19:33.617 If the name is not currently in the[br]counts dictionary, 0:19:33.617,0:19:35.210 I say counts sub name equals one. 0:19:36.440,0:19:39.680 else, that must mean it's already there[br]which means 0:19:39.680,0:19:42.886 it's okay to retrieve it, counts sub name[br]plus 1. 0:19:42.886,0:19:46.590 We're going to add a 1 to it and stick it[br]back in, okay? 0:19:46.590,0:19:49.350 And so when this finishes it's going to[br]add 0:19:49.350,0:19:52.730 entries and then add one to entries that[br]already exist. 0:19:52.730,0:19:57.370 And not traceback at all. And when we[br]print it out we're going to see the counts. 0:19:57.370,0:19:58.720 And literally this could have gone 0:19:58.720,0:20:02.400 a million times and it would just be fine[br]and it would just keep expanding. 0:20:02.400,0:20:02.900 Okay? 0:20:05.260,0:20:07.270 So this pattern of checking to see if a key 0:20:07.270,0:20:10.690 is in a dictionary, setting it to some[br]number, or 0:20:11.750,0:20:14.770 adding one to it is a really, really common[br]pattern. 0:20:16.030,0:20:19.550 It's so common, as a matter of fact, that[br]there is a 0:20:19.550,0:20:24.580 a special thing built into dictionaries[br]that does this for us, okay? 0:20:24.580,0:20:26.700 And there is this method called get. 0:20:27.960,0:20:30.490 And so, counts is the name of the[br]dictionary, 0:20:30.490,0:20:34.120 get is a built-in capability of[br]dictionaries. 0:20:34.120,0:20:35.630 And it takes two parameters. 0:20:35.630,0:20:43.110 The first parameter is a key name, like a[br]string, like csev or chen wen or marquard. 0:20:43.110,0:20:50.880 And then the second parameter is a value[br]to give back if this doesn't exist. 0:20:50.880,0:20:54.300 It's a default value if the key does not[br]exist. 0:20:54.300,0:20:55.850 And there's no traceback. 0:20:55.850,0:21:00.710 So this way you can encapsulate, in effect,[br]an if-then-else. 0:21:00.710,0:21:06.160 If the name parameter is in the counts,[br]print the thing out, otherwise print zero. 0:21:06.160,0:21:11.490 So this expression will either get the[br]number 0:21:11.490,0:21:16.810 if it exists or it will give me back a[br]zero if it doesn't exist. 0:21:16.810,0:21:18.770 So this is really valuable. 0:21:18.770,0:21:21.080 Right? This is really valuable. 0:21:21.080,0:21:22.630 That's a really bad smiley face. 0:21:22.630,0:21:28.590 So this is really valuable because it,[br]once, once we understand the idiom, 0:21:28.590,0:21:32.520 it really takes four lines of code and[br]turns it into one line of code. 0:21:32.520,0:21:34.620 Because we're going to be doing this[br]if-then-else all the time. 0:21:35.800,0:21:39.060 Now, and so we can reconstruct that loop 0:21:39.060,0:21:44.010 a lot easier and a lot more cleanly using this[br]idiom, right? 0:21:44.010,0:21:46.160 It's something that looks kind of complex[br]but you'll 0:21:46.160,0:21:49.140 get used to it really fast, okay? 0:21:49.140,0:21:51.530 So we have, everything here is the same, 0:21:51.530,0:21:53.780 we create an empty dictionary, we have five[br]names to 0:21:53.780,0:21:55.760 go through, we're going to write a[br]for loop 0:21:55.760,0:21:58.320 and it's going to go through each of[br]those. 0:21:58.320,0:22:04.550 And then we're going to say counts sub name[br]equals counts dot get the value stored 0:22:04.550,0:22:08.120 at name, and if you don't find it, give me[br]back a zero. 0:22:08.120,0:22:11.550 And then whatever comes back, either the[br]old value or 0:22:11.550,0:22:16.760 the zero, add 1 to that and then take that[br]sum and stick it in counts name. 0:22:17.870,0:22:19.530 Okay? So this is either 0:22:21.650,0:22:22.790 going to create, 0:22:26.170,0:22:29.740 or it's going to update. 0:22:30.070,0:22:32.990 If there is no entry, it's going to create[br]it and set it to one. 0:22:32.990,0:22:36.520 If there is an entry it's going to add one to[br]the current entry. 0:22:37.530,0:22:39.240 Okay? So this is, 0:22:42.770,0:22:44.660 this line is kind of an idiom. 0:22:46.510,0:22:48.420 Read about it in the book, figure it out, 0:22:48.420,0:22:50.340 get used to the notion of what this is doing. 0:22:50.340,0:22:53.370 Understand what that is doing, okay? 0:22:54.430,0:22:57.320 Because I'm going to start using it as if[br]you understand it. 0:22:58.490,0:23:05.300 So, the next problem is a problem of[br]finding the most common word. 0:23:05.300,0:23:07.910 So, finding the most common, the top 0:23:07.910,0:23:12.330 five, is often a, a trigger that says, use 0:23:12.330,0:23:14.390 dictionaries because if you're going to[br]have to count things up, 0:23:14.390,0:23:15.990 you're going to, you know, you don't 0:23:15.990,0:23:18.000 know what the most common thing is at the[br]beginning. 0:23:18.000,0:23:22.220 First you have to count everything up, and[br]dictionaries are a great way to count. 0:23:22.220,0:23:25.220 So here's a little problem and I would[br]like you to read 0:23:25.220,0:23:29.490 this text and find me the most common word[br]in the text. 0:23:29.490,0:23:32.960 And tell me what the most common word is[br]and how many times 0:23:34.550,0:23:36.520 it occurs. Ready? 0:23:36.520,0:23:39.800 I'm going to give you a thousandth of a[br]second, just like I would give a computer. 0:23:39.800,0:23:41.975 I would expect it'd be able to do this in[br]a thousandth of a second. 0:23:41.975,0:23:43.149 [SOUND] There you go. 0:23:43.149,0:23:45.978 [BLANK_AUDIO] 0:23:45.978,0:23:48.040 Okay, I gave you five seconds.[br]Time's up. 0:23:48.040,0:23:48.580 Did you get it? 0:23:49.580,0:23:52.620 Or did you say to yourself, you know what,[br]I hate 0:23:52.620,0:23:55.840 that, it's no good, I think I'll write a[br]Python program instead. 0:23:55.840,0:23:59.200 And he'll probably show me a Python[br]program if I wait long enough. 0:23:59.200,0:24:02.800 So here's a slightly easier problem from[br]the first lecture. 0:24:02.800,0:24:04.030 Ready? 0:24:04.030,0:24:04.936 It's the same problem. 0:24:04.936,0:24:07.915 Find the most common word and how many[br]times the word occurs. 0:24:07.915,0:24:12.171 [BLANK AUDIO] 0:24:12.171,0:24:34.171 [MUSIC] 0:24:35.437,0:24:40.190 Did you get it?[br]I believe the answer is, and I could look 0:24:40.190,0:24:45.900 really dumb here, oops, the answer is the,[br]and I think it's seven times. 0:24:45.900,0:24:48.310 So, that's the right answer. Okay? 0:24:48.310,0:24:50.160 Again, things humans are not so good at. 0:24:51.430,0:24:54.760 So, here's a piece of code that's starting[br]to combine some 0:24:54.760,0:24:57.690 of the things we've been doing in the past[br]few chapters all together. 0:24:57.690,0:25:01.110 We are going to read a line of text, 0:25:01.110,0:25:05.940 split it into words, count the occurrence, [br]how many times 0:25:05.940,0:25:10.070 each word occurs, and then print out a map. 0:25:10.070,0:25:14.580 So, so here's what we're going to do,[br]we're going to say okay, start 0:25:14.580,0:25:18.998 a dictionary, an empty dictionary, read[br]the line of input. 0:25:20.460,0:25:27.160 Then split it, remember, the split takes a[br]string and produces a list. 0:25:27.160,0:25:31.900 So words is a list, line is a string, and[br]then we'll print that out. 0:25:31.900,0:25:34.260 Then we're going to write a for loop[br]that's going to go 0:25:34.260,0:25:37.520 through each of the words, and[br]then create, use this idiom 0:25:37.520,0:25:42.180 counts sub word equals counts.get word, 0 + 1. 0:25:42.180,0:25:45.270 So this is going to do exactly what we talked[br]about in the previous 0:25:45.270,0:25:51.210 couple slides back, either create the[br]entries or add to those entries, okay? 0:25:51.210,0:25:52.383 And then we're going to print 0:25:52.383,0:25:52.860 them out. 0:25:52.860,0:25:55.620 So here's what that program does when it[br]prints out. 0:25:56.630,0:25:58.860 Now this is actually one long line I'm 0:25:58.860,0:26:00.820 just cutting it so you can see it. 0:26:00.820,0:26:05.390 Here's this line we enter, and the words[br]the, there's seven of them. 0:26:05.390,0:26:08.390 Then it takes this line and splits it into a 0:26:08.390,0:26:11.240 list, and there is the beginning and end[br]of the list. 0:26:11.240,0:26:13.680 The list maintains the order, so the 0:26:13.680,0:26:17.690 list simply breaks all these words into[br]separate 0:26:17.690,0:26:21.620 words in a list of strings.[br]From one string 0:26:22.770,0:26:29.120 to many strings. This is many strings.[br]And so the, and the spaces are gone. 0:26:29.120,0:26:31.040 And so now here's this list. 0:26:31.040,0:26:33.820 And then what we're going to do is we're[br]going to run through the list. 0:26:35.470,0:26:39.030 And we're going to keep running totals of[br]each of the words in the list. 0:26:39.030,0:26:40.180 And then when we're done with the list, 0:26:40.180,0:26:43.890 we're going to print out the contents of[br]that dictionary. 0:26:43.890,0:26:45.050 And we can inspect it and 0:26:45.050,0:26:47.480 go like, let's look for the biggest one,[br]na, na, na, na, na. 0:26:47.480,0:26:47.990 It's kind of like 0:26:47.990,0:26:50.510 looking for the largest, like, oh,[br]seven. 0:26:50.510,0:26:54.010 That's the largest and the largest word is[br]the. 0:26:54.010,0:26:54.730 Okay? 0:26:54.730,0:26:59.210 So that's how the program runs, it[br]reads a line, 0:26:59.210,0:27:01.640 splits it into a list of words, and then 0:27:01.640,0:27:05.090 accumulates a running total for each word,[br]and then we 0:27:05.090,0:27:08.930 hand inspect to see what the most common[br]word is. 0:27:08.930,0:27:09.430 Okay? 0:27:10.870,0:27:13.220 Oh no, no, I don't want that song again. 0:27:13.220,0:27:14.190 There we go. 0:27:14.190,0:27:18.280 And so and so here we have the, in it's[br]kind of a smaller fashion. 0:27:19.350,0:27:23.660 We make a dictionary.[br]This entering a line of text is here. 0:27:23.660,0:27:25.150 It's all one line. 0:27:25.150,0:27:27.170 We do the split and then we print the[br]words out. 0:27:29.160,0:27:32.500 And so that split creates a list of[br]strings from a single 0:27:32.500,0:27:37.100 string based on where the blanks are at,[br]chop, chop, chop, chop. 0:27:37.100,0:27:38.450 And then here 0:27:38.450,0:27:39.150 at counting, 0:27:41.180,0:27:45.510 we're going to loop through each of the[br]words one at a time and use this idiom, 0:27:45.510,0:27:52.710 counts sub word equals counts.get word, 0 + 1,[br]which is going to create and/or update. 0:27:52.710,0:27:54.960 And then we print the counts out and that[br]comes out there. 0:27:56.110,0:27:56.610 Okay? 0:27:57.710,0:27:59.610 So, again, this is the new thing that[br]we've done. 0:27:59.610,0:28:01.710 Everything else we've kind of seen before. 0:28:04.750,0:28:08.429 Now we can also loop through dictionaries[br]with for loops. 0:28:12.550,0:28:15.320 The for loop, we've been, put all kinds of[br]things over here. 0:28:15.320,0:28:18.890 We've put strings over here, we've put[br]lists of numbers over here. 0:28:18.890,0:28:21.110 We've put files over here. 0:28:21.110,0:28:23.470 And basically what it really says is you 0:28:23.470,0:28:26.360 know, if this is a collection of things, 0:28:26.360,0:28:28.340 run this little indent code once for[br]each item in 0:28:28.340,0:28:32.850 the collection, and key then becomes our[br]iteration variable. 0:28:32.850,0:28:35.150 And key is very mnemonic here. 0:28:35.150,0:28:37.200 It doesn't know that they are keys. 0:28:37.200,0:28:39.480 And so, keys. 0:28:39.480,0:28:44.680 The key here is that, there's a bit, the[br]important 0:28:44.680,0:28:50.180 concept here is that dictionaries are [br]key/value pairs and so this is 0:28:50.180,0:28:52.900 only one variable and so it actually[br]decides that, they've decided that 0:28:52.900,0:28:56.140 it goes through the keys, which is[br]actually quite useful. 0:28:56.140,0:29:00.700 So key is going to take on the successive[br]values of the labels. 0:29:00.700,0:29:02.400 Not the successive values of 0:29:02.400,0:29:04.060 the values stored at the labels. 0:29:04.060,0:29:10.250 But it's really easy for us to retrieve[br]the contents at that label counts sub key. 0:29:10.250,0:29:15.080 So we're going to use the key 'chuck',[br]'fred', 'jan', to look up the 1, 42, 100. 0:29:15.080,0:29:17.900 And so it prints out the key, 0:29:17.900,0:29:22.180 and then the value at it, the key, and the[br]value at it, and the key, and the value. 0:29:22.180,0:29:25.050 And so we're able to sort of go through 0:29:25.050,0:29:27.330 the dictionary and look at all the[br]key/value pairs, 0:29:27.330,0:29:29.900 which is the common thing that you really[br]want to do. 0:29:31.000,0:29:31.500 Okay? 0:29:35.240,0:29:38.400 Now there's some methods inside of[br]dictionaries that allow 0:29:38.400,0:29:42.620 us to convert dictionaries into lists[br]of things. 0:29:42.620,0:29:47.140 And so if you simply take a dictionary, so[br]here's a little dictionary with 0:29:47.140,0:29:51.750 three items in it, and we can say list sub[br]and then give a dictionary name 0:29:51.750,0:29:54.060 right there, and then that converts it[br]into a 0:29:54.060,0:29:56.640 list. But it's just a list of the keys. 0:29:57.680,0:30:01.320 We can also say jjj dot keys, kind of do[br]the same thing. 0:30:01.320,0:30:05.170 Say give me a list consisting of the keys. 0:30:05.170,0:30:10.150 And then jjj dot values gives you a list[br]of the values, 1, 42, and 100. 0:30:10.150,0:30:12.810 Of course they're not in the same order. 0:30:12.810,0:30:16.060 Now interestingly, as long as you don't[br]modify the dictionary, 0:30:16.060,0:30:19.510 the order of these two things corresponds[br]as long as 0:30:19.510,0:30:23.050 in between here you're not changing it.[br]So the first jan maps to 100, 0:30:23.050,0:30:25.420 chuck maps to 1, 0:30:25.420,0:30:27.680 and fred maps to 42. 0:30:27.680,0:30:30.200 So the order, you can't predict the order[br]they're 0:30:30.200,0:30:32.170 going to come out but these two things[br]will 0:30:32.170,0:30:34.550 come out in the same order, whatever that[br]order 0:30:34.550,0:30:38.110 happens to be. Okay, and so there's one[br]more thing. 0:30:39.220,0:30:44.190 So we've got the keys, we've got the[br]values, and we've got a thing called items. 0:30:44.190,0:30:50.460 items also returns a list, it's a list.[br]But it's a list of 0:30:50.460,0:30:54.920 what Python calls tuples.[br]That's what the next chapter is about. 0:30:54.920,0:30:56.700 We'll talk more about tuples in the next[br]chapter. 0:30:57.910,0:31:01.160 A tuple is a key/value pair. 0:31:01.160,0:31:05.970 So this list has three things in it.[br]One, two, three. 0:31:05.970,0:31:10.240 The first one jan maps to 100, the[br]second is chuck maps to 1, the 0:31:10.240,0:31:15.570 third one is fred maps to 42. So,[br]just kind of bear with me for a second. 0:31:15.570,0:31:17.520 We'll hit this a little harder in the next[br]chapter. 0:31:18.920,0:31:20.850 But the place that this, the idiom where 0:31:20.850,0:31:23.930 this works very beautifully is on a for[br]loop. 0:31:23.930,0:31:26.720 Now, for those of you who have programmed[br]in other languages, this will be 0:31:26.720,0:31:29.700 kind of weird because other languages have 0:31:29.700,0:31:33.680 iterations but they don't have two[br]iteration variables. 0:31:33.680,0:31:35.770 Python has two iteration variables. 0:31:35.770,0:31:37.480 It can be used for many things but one of the 0:31:37.480,0:31:41.090 things that it's used for that's really[br]quite nice is 0:31:41.090,0:31:46.110 we can have two iteration variables.[br]This jj items returns pairs of 0:31:46.110,0:31:51.200 things and then aaa and bbb are iteration[br]variables that sort of 0:31:51.200,0:31:56.580 move in synchronized, move, are[br]synchronized as they move through. 0:31:56.580,0:32:01.250 So aaa takes on the value of the key. 0:32:01.250,0:32:05.670 bbb takes on the value of the, the[br]value. 0:32:05.670,0:32:09.110 And then the loop runs once. 0:32:09.110,0:32:13.090 Then aaa is advanced to the next key. 0:32:13.090,0:32:17.410 And bbb is advanced to the next value[br]simultaneously, synchronized. 0:32:17.410,0:32:19.910 Then they print that out, then it advances[br]to the 0:32:19.910,0:32:22.700 next one, and the next one, and they print[br]that out. 0:32:22.700,0:32:27.210 So they are moving in a synchronized way. 0:32:27.210,0:32:31.050 Now again, the order jan, chuck, fred is not[br]the same. 0:32:31.050,0:32:33.360 But the correspondence between jan 100, 0:32:33.360,0:32:37.090 chuck 1, and fred,[br]that's going to, that's going to work. 0:32:37.090,0:32:40.680 And so basically, as these things go, they[br]work 0:32:40.680,0:32:43.960 their way through whatever order they're[br]stored in the dictionary. 0:32:43.960,0:32:45.440 So this is quite nice. 0:32:45.440,0:32:48.870 Two iteration variables going through[br]key/value. 0:32:48.870,0:32:53.850 Now if I was making these names mnemonic,[br]and they made more sense, 0:32:53.850,0:32:57.200 I would call this the key variable and[br]that would be the value variable. 0:32:58.440,0:33:00.590 But for now I'm just using kind of silly[br]names 0:33:00.590,0:33:02.910 to show you that key and value are not[br]special. 0:33:02.910,0:33:05.580 They're not Python reserved words in any[br]way. 0:33:05.580,0:33:09.215 They're just a good way to name these[br]things, key/value pairs. 0:33:09.215,0:33:09.715 Okay? 0:33:12.020,0:33:13.360 Okay. 0:33:13.360,0:33:16.920 Now we're going to circle all the way[br]back to the beginning. 0:33:16.920,0:33:18.500 All the way back to the first lecture. 0:33:18.500,0:33:24.050 And I gave you this program, and I said[br]don't worry about it. 0:33:24.050,0:33:27.660 We'll learn about it later.[br]Well, now later. 0:33:27.660,0:33:32.030 At this point you should be able to[br]understand every line of this program. 0:33:33.490,0:33:38.280 This is the program that's going to count[br]the most common word in a file. 0:33:38.280,0:33:39.000 Okay? 0:33:39.000,0:33:41.190 So let's walk through what it does and[br]hopefully 0:33:41.190,0:33:44.550 by now this will make a lot of sense. 0:33:45.610,0:33:47.910 Okay? So we're going to start out, we're[br]going to ask 0:33:47.910,0:33:51.070 for a file name, we're going to open that[br]file for read. 0:33:52.140,0:33:54.710 Then, because we know it's not a very large 0:33:54.710,0:33:56.760 file, we're going to read it all in one go. 0:33:56.760,0:34:00.480 So handle dot read says read the whole[br]file, newlines and all, 0:34:00.480,0:34:03.580 blanks, newlines, whatever,[br]and put it in 0:34:03.580,0:34:07.530 the variable called text, it's just[br]mnemonic. Remember I'm, in this one 0:34:07.530,0:34:12.650 I'm using the mnemonic variable names.[br]Then go through that whole 0:34:12.650,0:34:16.630 string, which was the whole file, go[br]through and split it all. 0:34:16.630,0:34:19.840 Newlines don't hurt it.[br]Newlines are treated like blanks. 0:34:19.840,0:34:21.540 And it understands all that. 0:34:21.540,0:34:23.420 It throws the newlines away and the[br]blanks away 0:34:23.420,0:34:27.040 and splits it into a beautiful list of[br]just words with no blanks. 0:34:28.540,0:34:33.070 And the list of the words in that file[br]ends up in the variable words. 0:34:33.070,0:34:36.009 words is a list, text is a string, words[br]is a list. 0:34:37.090,0:34:41.600 Then what I do is the pattern of[br]accumulating counters in a dictionary. 0:34:41.600,0:34:43.790 I make an empty dictionary. 0:34:43.790,0:34:47.500 I have the word variable that goes through[br]all the words 0:34:47.500,0:34:52.830 and then I just say, counts sub word equals[br]counts dot get(word,0) + 1, 0:34:53.920,0:34:56.659 and that, like we just got done saying,[br]it both creates 0:34:56.659,0:35:02.020 and/or increments the value in the[br]dictionary as needed. 0:35:02.020,0:35:03.610 So now at the end of the, at the, at this 0:35:03.610,0:35:11.650 point in the program, we have a full[br]dictionary with the word:count. 0:35:11.650,0:35:12.460 Okay? 0:35:12.460,0:35:15.040 And there's many of them.[br]You know, all the words, all the counts. 0:35:15.040,0:35:17.280 They're not in any particular order. So now what 0:35:17.280,0:35:21.800 we're going to do is we're going to write[br]a largest loop, find the largest. 0:35:21.800,0:35:23.680 Which is another thing that we've done. 0:35:23.800,0:35:27.010 So not only do I need to now know what[br]largest count I've seen so far, 0:35:27.010,0:35:29.640 I need to know what word that is. 0:35:29.640,0:35:32.870 So I'm going to set the largest count[br]we've seen so far to None, set 0:35:32.870,0:35:36.780 the largest word we've seen so far[br]to None, and then I'm going to use this 0:35:36.780,0:35:38.740 two-iteration variable pattern to say 0:35:38.740,0:35:44.230 go through the key/value pairs word and[br]count in counts.items. 0:35:44.230,0:35:44.920 So it's just going to 0:35:44.920,0:35:47.120 go through [SOUND] all of them. 0:35:47.120,0:35:52.930 And then I'm going to ask if the largest[br]number I've seen so far is None or 0:35:52.930,0:35:56.410 the current count that I'm looking at is[br]greater then the largest I've seen so far, 0:35:59.280,0:36:03.260 keep them. Take the current word, stick it[br]in biggest word so far, 0:36:03.260,0:36:07.180 take the current count, stick it in[br]the biggest count so far. 0:36:07.180,0:36:09.670 So this is going run through all of the 0:36:09.670,0:36:14.290 word.count pairs, word.count key/value pairs. 0:36:14.290,0:36:16.640 And then when it comes out, it's going to[br]print out 0:36:16.640,0:36:19.430 the word that's the most common and how[br]many times. 0:36:20.680,0:36:24.290 So if we feed in that clown text, it will[br]run all this stuff, and print out 0:36:24.290,0:36:29.170 oh, the is the most common word, and it[br]appeared seven times. 0:36:29.170,0:36:33.540 Or if I print the stuff that was two[br]slides back, words.txt, from the actual 0:36:33.540,0:36:37.790 textbook, then it says the word to is the[br]most common and it happened 16 times. 0:36:37.790,0:36:43.380 So I could easily, you know, throw 10[br]million, 10 million 0:36:43.380,0:36:46.380 words through this thing, and it would[br]just be totally happy. 0:36:46.380,0:36:49.370 Right? And so, this is not that complex 0:36:49.370,0:36:52.700 of a problem, but it's using a whole bunch[br]of idioms that we've been using. 0:36:52.700,0:36:57.380 The splitting of words, the accumulation[br]of multiple counters in a dictionary. 0:36:57.380,0:37:02.110 And so, it sort of is the beginning of[br]doing some kind of data 0:37:02.110,0:37:06.040 analysis that's hard for humans to do, and[br]error-prone for humans to do. 0:37:06.040,0:37:08.500 And so this is, we're reviewing collections. 0:37:08.500,0:37:10.510 We've introduced dictionaries. 0:37:10.510,0:37:13.310 We've done the most common word pattern,[br]talked about that. 0:37:13.310,0:37:14.300 The lack of order, and 0:37:14.310,0:37:16.270 I did that a bunch of times. 0:37:16.270,0:37:19.750 And we've looked ahead at tuples,[br]which is the next, 0:37:19.750,0:37:22.210 the third kind of collection that we're[br]going to talk about. 0:37:22.210,0:37:25.890 And they're actually in some ways a little[br]simpler than dictionaries. 0:37:25.890,0:37:27.150 And simpler than lists. 0:37:27.150,0:37:33.000 So, see you in the next lecture, Chapter[br]10, tuples.