Python for Informatics - Chapter 9 - Dictionaries
-
0:00 - 0:05Hello again, and welcome to Chapter Nine
of Python, Dictionaries. -
0:05 - 0:09As always, this lecture is copyright
Creative Commons Attribution. -
0:09 - 0:14That means the audio, the video, the
slides, and even my scribbles. -
0:14 - 0:18You can use them any way you like, as long
as you attribute them. -
0:18 - 0:20Okay, so this is the second chapter
-
0:20 - 0:22where we're talking about collections, and
the collections -
0:22 - 0:26are like a piece of luggage in that you
can put multiple things in them. -
0:28 - 0:30Variables that we've talked about sort of
starting in -
0:30 - 0:35Chapter Two and Chapter Three were simple
variables, scalar. -
0:35 - 0:37They're just kind of one thing, and as
soon as you, -
0:37 - 0:41like, put another thing in there, it
overwrites the first thing. -
0:41 - 0:46And so if you look at the code, you know,
x = 2 and x = 4, -
0:46 - 0:51the question is, you know, where did
the 2 go? -
0:51 - 0:53Right? The 2 was there, x was there,
-
0:53 - 0:57there was a 2 in there, and then we cross
it out and put 4 in there. -
0:57 - 1:01This is sort of the basic operation, the
assignment statement, it's a replacement. -
1:01 - 1:04But a dictionary allows us to put more
than one thing. -
1:04 - 1:06Not using this syntax, but it allows us to
-
1:06 - 1:09have a variable that's really an aggregate
of many values. -
1:09 - 1:12And the difference between a list and a
dictionary -
1:12 - 1:16is how the values are structured within
that single variable. -
1:16 - 1:18The list is a linear collection,
-
1:18 - 1:21indexed by integers 0, 1, 2, 3.
-
1:21 - 1:24If there's five of them, it's 0 through 4,
very much like a -
1:24 - 1:28Pringle's can here, where they're just
stacked nicely on top of each other. -
1:28 - 1:33Everything's kind of organized. We talked
about it in the last, in the last lecture. -
1:33 - 1:36This lecture we're talking about dictionaries.
-
1:36 - 1:38A dictionary's very powerful.
-
1:38 - 1:42It's, and its power comes from a different
way of organizing itself internally. -
1:42 - 1:44It's a bag of values,
-
1:44 - 1:48like a just sort of, just stuff's
in it, it's not in any order. -
1:48 - 1:49Big stuff, little stuff.
-
1:49 - 1:51Things have labels.
-
1:51 - 1:52You can also think of it like a purse with
-
1:52 - 1:55just things in it that's like, it's not
like stacked. -
1:55 - 1:58It's just, stuff moves around as you're going
-
1:58 - 2:01and that's, that's a very good model for
dictionaries. -
2:02 - 2:03And so dictionaries
-
2:03 - 2:06have to have a label because the stuff is
not in order. -
2:06 - 2:08There's no such thing as the third thing.
-
2:08 - 2:10There is the thing with the label perfume.
-
2:10 - 2:11There's the thing with the label candy.
-
2:11 - 2:14There's the thing with the label money.
-
2:14 - 2:17And so there's the value, the thing, the money.
-
2:17 - 2:19And then there's always also the label.
-
2:19 - 2:23We also call these key/value.
-
2:25 - 2:29The key is the label and the value is
whatever. -
2:29 - 2:31And so these pink things are all labels for
-
2:31 - 2:33various things you could put in your purse.
-
2:33 - 2:36So you could say to your purse, "hey purse,
give me my tissues." -
2:36 - 2:38"Hey purse, give me my money."
-
2:38 - 2:40And it, it's in there somewhere and the
purse sort of -
2:40 - 2:43gives you back the tissues or the money.
-
2:43 - 2:49And it's, Python's most powerful data
collection is the dictionaries. -
2:49 - 2:50And it's when
-
2:50 - 2:52you get used to wielding them you'll say,
like, -
2:52 - 2:54whoa, I can do so much with these things.
-
2:54 - 2:56And at the beginning you just sort of
-
2:56 - 3:00learning sort of how to use them without
hurting yourself. -
3:00 - 3:01But they're very powerful.
-
3:01 - 3:02It's like a database.
-
3:02 - 3:07It's, it allows you to store very arbitrary
data organized in however you feel like -
3:07 - 3:11organizing it, in a way that advances the
cause of the program that you're writing. -
3:11 - 3:16And we're still kind of at the very
beginning, but as you learn more, -
3:16 - 3:18these will become a very powerful
tool for you. -
3:20 - 3:23They, dictionaries have different names in
different languages. -
3:25 - 3:27PERL or PHP would call them associative
arrays. -
3:29 - 3:32Java would call them a PropertyMap or a
HashMap. -
3:32 - 3:35And C# might call them a property bag or
an attribute bag. -
3:35 - 3:38And so they're, they're just the same
concept. -
3:38 - 3:42It's keys and values is the concept that's
across all these languages. -
3:42 - 3:44Just are very powerful.
-
3:44 - 3:45And if you look at the Wikipedia entry
-
3:45 - 3:46that I have here
-
3:46 - 3:49you can see that it's just, it's a concept
-
3:49 - 3:53that we give different names in different
languages. Same concept, different names. -
3:54 - 3:58So like I said, the difference between a
list and a dictionary, they both can store -
3:58 - 4:01multiple values. The question is how we
label them, -
4:01 - 4:03how we store them, and how we retrieve
them. -
4:03 - 4:07So here's an example use of a dictionary.
I'm going to make a thing called purse. -
4:07 - 4:11And I'm going to store in purse, this is
an assignment statement, -
4:11 - 4:14purse sub money.
So this isn't like sub zero. -
4:14 - 4:15This is sub money.
-
4:15 - 4:18So I'm actually using a string as the
place. -
4:18 - 4:21And, so I'm going to say stick 12
in my purse -
4:21 - 4:24and stick a Post-it note that says
that's my money. -
4:24 - 4:26Candy is 3. Tissues is 75.
-
4:26 - 4:32And if I look at that, it's not just the
numbers 12, 3, and 75 as it -
4:32 - 4:37would be in a list. It is the connection
between money and 12, -
4:37 - 4:42tissues is 75, candy is 3.
And in the key/value, that's the -
4:42 - 4:47key and that's the value.
So candy is the key and 3 is the value. -
4:47 - 4:52Now I can look things up by their name,
print purse sub candy. -
4:52 - 4:57Well it goes and finds it, asking hey purse,
give me back candy, and it -
4:57 - 5:00goes and finds the value, which is 3, and
so out comes a 3. -
5:00 - 5:03We can also put it
-
5:03 - 5:06on the right-hand side of an
assignment statement, -
5:06 - 5:07so purse sub candy says give me
the old version of candy, -
5:07 - 5:10and then add 2 to it, which
-
5:10 - 5:14gives me 5, and then store it back
in that purse -
5:14 - 5:16under the label candy.
-
5:16 - 5:19So we see candy changing to 5.
-
5:19 - 5:21And so, this is a place, and you could
-
5:21 - 5:23do this with a list except these would be
numbers. -
5:23 - 5:28You could say purse sub two is equal to
purse sub two plus two, or whatever. -
5:28 - 5:32But in dictionaries, there are labels.
-
5:32 - 5:33Now, they're not strings.
-
5:33 - 5:35Strings is a very common label in
dictionaries, but -
5:35 - 5:38it's not always strings, you can use other
things. -
5:38 - 5:40In this chapter we'll pretty much focus on
strings. -
5:40 - 5:44You can even use numbers and then you
would get a little confused. -
5:44 - 5:45But you can.
-
5:45 - 5:48So here's sort of a picture of how this
works. -
5:48 - 5:53So, if we take a look at this line purse
sub money equals 12, -
5:53 - 5:58it's like we were putting a key/value
connection, money is the label for 12. -
5:58 - 6:01And then we sort of move that in.
-
6:01 - 6:04And it's up to the purse to decide
where things live. -
6:04 - 6:10If we look at the next line, we're going to
put the value in with a -
6:10 - 6:123 in with the label candy, and we're
going to put -
6:12 - 6:15the value 75 in with the label of tissues.
-
6:15 - 6:18And when we say hey purse, print yourself
out, it just -
6:18 - 6:21goes and pulls these things back out and
hands them to us. -
6:21 - 6:25And what it's really, it's giving us both the
label and the value and it's necessary -
6:25 - 6:26to do that cause they're just like 12,
-
6:26 - 6:2975, and 3. What exactly is that?
-
6:29 - 6:31And so this syntax with the curly braces
-
6:31 - 6:35is what happens when you print a
dictionary out. -
6:35 - 6:39The same thing happens when we're sort of
printing purse sub candy, right? -
6:39 - 6:40Purse sub candy,
-
6:42 - 6:45it's like dear purse, go and find the candy
thing. -
6:45 - 6:46Look at that one, look at that one.
-
6:46 - 6:48Oh, yep, yep, this is candy.
-
6:48 - 6:50But what we're looking for is the value,
-
6:50 - 6:53and so that's why 3 is coming out here.
-
6:53 - 6:57So go look up under candy, and tell me
what's stored under candy. -
6:57 - 6:59These can be actually more complex things,
-
6:59 - 7:01I'm just keeping it simple for this
lecture. -
7:03 - 7:08And then, when we say purse sub candy
equals purse sub candy plus 2, well it -
7:08 - 7:14pulls the 3 out, looking at the label
candy, then adds 3 plus 2 and makes 5, -
7:14 - 7:20and then it assigns it back in, and then
that says, oh, go, go place this number 5 -
7:20 - 7:26in the purse with the label of candy,
which then replaces the 3 with a 5. -
7:26 - 7:27Okay?
-
7:28 - 7:30And if we print it out, we see that the
-
7:30 - 7:35new variable, or the new candy entry,
is now 5. -
7:35 - 7:36Okay?
-
7:37 - 7:41So if we just sort of put these things
side by side, we create -
7:41 - 7:44them sort of both the same way and we make
an empty list, and an empty -
7:44 - 7:47dictionary, we call the append method
because -
7:47 - 7:49we're sort of just putting these things in
-
7:49 - 7:52order. You gotta put the first one in
first. So it's not telling you where. -
7:52 - 7:53You kind of know that this
-
7:53 - 7:55will be the first one, cause we're
starting with an empty one, -
7:55 - 7:57and this will be the second one.
-
7:57 - 8:02We put in the values 21 and 183, and then
we print it out, and it's like okay, you gave -
8:02 - 8:04me the values 21 and 183, I will maintain
the order for you, -
8:04 - 8:08there's no keys other than their position.
-
8:08 - 8:12The position is the key, as it were, so if
I want to to change the first one to 23, -
8:12 - 8:17well, I say list sub zero, which is this,
and then change that to 23. -
8:17 - 8:20So this is sort of used as a lookup to
-
8:20 - 8:23find something. It can be used on either the
right-hand side or the -
8:23 - 8:25left-hand side of an assignment statement.
-
8:25 - 8:28Comparing that to dictionaries, I want to
put a 21 in there -
8:28 - 8:30and I want to put it with the label age.
-
8:30 - 8:33I'm going to put 182, put that in with the
label course. -
8:33 - 8:37So we don't have to like, make an entry.
-
8:37 - 8:38The fact that the entry doesn't exist,
-
8:38 - 8:42it creates the age entry and sticks 21 into it,
-
8:42 - 8:44creates the course entry, sticks 182 into it.
-
8:44 - 8:49We print it out and it says, oh, course
is 182 and age is 21. -
8:49 - 8:55This emphasizes that order is not
preserved in dictionaries. -
8:56 - 8:58I won't go into like great detail as to
why that is. -
8:58 - 9:01It turns out that that's a compromise that
-
9:01 - 9:05makes them fast using a technique called
hashing. -
9:05 - 9:09It's how it actually works internally,
go Wikipedia hashing and -
9:09 - 9:10take a look.
-
9:10 - 9:14But, the thing that matters to us as
programmers primarily -
9:14 - 9:20is that lists maintain order and
dictionaries do not maintain order. -
9:20 - 9:24They, dictionaries give us power
that we don't have in lists. -
9:24 - 9:26I mean they're very complimentary.
-
9:26 - 9:28Now there's not this one that's better
than the other. -
9:28 - 9:29They've very complimentary.
-
9:29 - 9:32Different kinds of data is either better
represented as a list -
9:32 - 9:33or as a dictionary, depending on the
-
9:33 - 9:35problem you're trying to solve.
-
9:35 - 9:39And in a moment we'll, we'll be writing
programs that are using both. -
9:39 - 9:41So if we come down here and I say,
-
9:41 - 9:47okay, stick 23 into, assignment statement,
into ddd sub age, -
9:47 - 9:51well that will change this 21 to 23,
so when we print it out. -
9:51 - 9:53So you can, this part, where you look
something up and -
9:53 - 9:56change the value, you can do either way.
-
9:56 - 9:58It's just how you do it here
-
9:58 - 10:00is a little bit different, okay?
-
10:00 - 10:04So let's look through this code again.
-
10:04 - 10:07And so I like, I like to use the word key
and value. -
10:07 - 10:09Key is the way we look the thing up,
and in lists -
10:09 - 10:13keys are numbers starting at
zero and with no gaps. -
10:13 - 10:15In dictionaries keys are whatever we want
them to be, -
10:15 - 10:18in this case I'm using strings.
-
10:18 - 10:21And then the value is the number we're
storing in it. -
10:21 - 10:25So we create this kind of a list with that
kind, those -
10:25 - 10:26kinds of statements.
-
10:26 - 10:29This statement creates this kind of a thing.
-
10:29 - 10:34Now, if we, if we think of this assignment
statement as moving data -
10:34 - 10:37into a new, into a place, a new item of
data into a place. -
10:41 - 10:43It's looking at this thing right here.
-
10:43 - 10:45Right? It's like, that's where I want to
move it. -
10:45 - 10:48And so it hunts, and says, look the key up.
-
10:48 - 10:50And that's the one that I'm going to change.
-
10:50 - 10:52And then once it knows which it's going to
change, -
10:52 - 10:57then it's going to take the 23, and it's
going to put the 23 into that location. -
10:57 - 11:01And so that's how this changes from that
to that. -
11:01 - 11:07Similarly when we get down to here, we're
going to stick 23 somewhere and -
11:07 - 11:10this is, this expression, this lookup
expression, the index -
11:10 - 11:13expression ddd sub age, is where we're
going to put it. -
11:13 - 11:16So, we're looking here, where is that thing?
-
11:16 - 11:20Well, that thing is this entry
-
11:20 - 11:23in the dictionary. And so now when we're
going to store the 23, -
11:23 - 11:24we know where the 23 is going to go.
-
11:24 - 11:27It's going to overwrite the 21 and so the
21 is -
11:27 - 11:31going to change to 23, okay? So they're
kind of similar. -
11:31 - 11:34There are things that work similar in them
-
11:34 - 11:36and then there are things that work
differently in them. -
11:38 - 11:41We can make literals, constants, with
-
11:41 - 11:43curly braces. And they look just like the print.
-
11:43 - 11:45That's one nice thing about Python.
-
11:45 - 11:49When you print something out it's showing
you how you can make a literal, and -
11:49 - 11:56basically you just open with a curly brace
and say chuck colon 1, fred 42, jan 100. -
11:56 - 11:58And we're making connections.
-
11:58 - 12:02key/value pair, key/value pair.
We print it out and -
12:05 - 12:06No order. They don't maintain order.
-
12:06 - 12:09Now they might come out in the same order,
but that's just lucky. -
12:09 - 12:09Right?
-
12:09 - 12:11All the ones I've shown you so far don't
-
12:11 - 12:13come out in the same order, which is good
to demonstrate it. -
12:13 - 12:16If it one time came out in the same order
that wouldn't be broken. -
12:16 - 12:18It's not like it doesn't want to come out
in the same order. -
12:18 - 12:22It's just, you don't, it's not internally
stored, and you -
12:22 - 12:24add an element and it may reorder them.
-
12:25 - 12:28You can do an empty dictionary with just a
curly brace, curly brace. -
12:33 - 12:37So, I'm going give you another example.
-
12:37 - 12:40And I'm going to show you a series of
names. -
12:40 - 12:46And I want you to figure out what the most
common name is -
12:46 - 12:48and how many times each name appears.
-
12:48 - 12:52Now these are real people.
They actually work on the Sakai project. -
12:52 - 12:59Steven, Zhen, and Chen, and me.
So these are people that are actually -
12:59 - 13:01in the data that we use in this course.
-
13:01 - 13:04Okay? And so I think I'll show you about
fifteen names -
13:04 - 13:07and you're to come up with a way, I'm
going to -
13:07 - 13:11show them to you one at a time, you need to
come up with a way to keep track of these. -
13:11 - 13:12Okay?
-
13:12 - 13:16So I'll just, with no further ado I will show
you the names. -
13:16 - 13:26[BLANK_AUDIO]
-
13:54 - 13:58Okay, so that's all the names.
Did you get it? -
13:58 - 14:00You might have to go back and do it again.
-
14:01 - 14:04How did you solve the problem?
-
14:04 - 14:08What kind of a data structure did you
build to solve the problem? -
14:08 - 14:11Or did you just say wow that's painful, I
-
14:11 - 14:15think I will learn Python instead, in
solving that problem. -
14:15 - 14:16Okay?
-
14:16 - 14:20So pause the, pause the video if you want and
-
14:20 - 14:23write down or go back, write down what you
think the -
14:23 - 14:28number of the most common name is and how
many times. -
14:30 - 14:32Okay. Now I'll show you.
-
14:32 - 14:35So here is the whole list.
It's all of them. -
14:35 - 14:39And now that we see all of them, we
use our amazing human -
14:39 - 14:43mind and we scan around, and look at
purpleness and, and all that stuff. -
14:43 - 14:44And then we go like, oh, this is a so
-
14:44 - 14:46much easier problem when I'm looking
at the whole thing. -
14:48 - 14:52And I think that the most common person is
Zhen, and -
14:54 - 14:59I think we see Zhen, I think we see Zhen
five times. -
15:01 - 15:07And I think csev is one, two, three and
Chen Wen is one, two. -
15:07 - 15:09And Steve Marquard is one, two, three.
-
15:09 - 15:13So the question is, what is an effective
data structure if you going to see -
15:13 - 15:16a million of these, what kind of data
structure would you have to produce? -
15:16 - 15:17Because you can't keep it in you head
-
15:17 - 15:20even, even this number of people, you can't
-
15:20 - 15:22even this amount of data, no way you can
keep it in your head. You have to come -
15:22 - 15:25up with some kind of a variable, as it were,
-
15:25 - 15:28just like largest so far was the variable.
-
15:28 - 15:30Some kind of variable that gets you to
-
15:30 - 15:31the point where you understand what's
going on. -
15:31 - 15:35And so this is the most common technique
to solve this -
15:35 - 15:39problem where you keep a running total of
each of the names. -
15:39 - 15:42And if you see a new name, you add them to
the list. -
15:42 - 15:45So csev and then you give him a one,
-
15:45 - 15:47and then you see Zhen and you give her a
one, -
15:47 - 15:50and then you see Chen and you give her a
one. -
15:50 - 15:52And then you see csev again and you give
him a two. -
15:52 - 15:55And you see a two, and a two, and a one
right? -
15:55 - 15:57[COUGH]
-
15:57 - 16:03And so then when you're all done you have
the mapping, right, of these things -
16:03 - 16:06and you go oh, okay, let me look through
here and find the largest one. -
16:06 - 16:10That's the largest one and so that must be
the person who is the most. -
16:10 - 16:12So you need a scratch area,
-
16:12 - 16:15a data structure or a piece of paper as
it were, -
16:15 - 16:19and so that's what, exactly what
dictionaries are really good at. -
16:19 - 16:24You could think of this as like a
histogram. You know, it's, -
16:24 - 16:28it's a bunch of counters, but counters
that are indexed by a string. -
16:28 - 16:29So we use a lot of this.
-
16:29 - 16:34And so this is a pattern of many counters
with a dictionary, simultaneous counters. -
16:34 - 16:35We're counting a bunch of, we're looking
-
16:35 - 16:39at a series of things, and we're going to
simultaneously keep track -
16:39 - 16:43of a large number of counters, rather than
just one counter. -
16:43 - 16:47How many names did you see total? Whatever,
12. But how many of each name -
16:47 - 16:50did you see is a bunch of counters, so
it's a bunch of simultaneous counters. -
16:52 - 16:57So a dictionary is, is great for this,
a dictionary is great for this. -
16:57 - 16:59We, when we see somebody for the first
-
16:59 - 17:00time, we can add an entry to the
dictionary, -
17:00 - 17:04which is kind of like going oh,
csev one, -
17:04 - 17:08and then Chen Wen one. Now these don't
exist yet. -
17:08 - 17:10Right? So we've got csev one and Chen
Wen one, so -
17:10 - 17:13that creates an entry and sticks a one in
it and the -
17:13 - 17:17mapping between the key csev and the value
one, the key Chen Wen -
17:17 - 17:20and the value one and then we say, hey
what's in there? -
17:20 - 17:23Oh, we've got a csev is one and
Chen Wen is one. -
17:23 - 17:26And then we see Chen Wen a second time,
-
17:26 - 17:27so we'd add another number right there.
-
17:27 - 17:31So this old number is one, we add one to
it and we get -
17:31 - 17:35two and then we stick that back in and
then we do the calculations. -
17:35 - 17:39We do a dump and say oh there's two in
Chen Wen and one in csev. -
17:40 - 17:41Okay?
-
17:42 - 17:46So this is a great data structure for the
simutaneous counters like what's -
17:46 - 17:50the most common word, who had the most
commits, da, da, da, da, da. -
17:51 - 17:54Now, everything we do we have to figure
out -
17:54 - 17:56like, when you're going to get in trouble
with Python. -
17:56 - 18:00When Python's going to give you the old
thumbs down and say oh, you went too far. -
18:00 - 18:06So one thing Python does not like is if
you reference a key before it exists. -
18:06 - 18:10We'll, we'll talk in a second how to
work around this. But if you simply -
18:10 - 18:12create a dictionary and say, oh, print out
-
18:12 - 18:15what's under csev, it gives you a
traceback. -
18:15 - 18:16It's like,
-
18:16 - 18:18I'm going to inform you that that's not
there. -
18:18 - 18:20And it says key error, csev.
-
18:20 - 18:25Now, the thing that allows us to solve
this is the in operator. -
18:25 - 18:28We've used the in operator to see if a
substring was in a string. -
18:28 - 18:30Or if a number was in a list.
-
18:30 - 18:37So, so this in operator says, in operator
says, hey, ask a question. -
18:37 - 18:42Is the string csev a current key in the
dictionary ccc? -
18:43 - 18:46Is the string csev a current key in the
dictionary ccc? -
18:46 - 18:48And it says, False.
-
18:49 - 18:52So now we have something that doesn't give
a traceback -
18:52 - 18:55that can tell us whether or not the key is
there. -
18:55 - 18:57So if you remember the algorithm, the
first time you see it, you -
18:57 - 19:01set them to one, and every other time, you
add one to them. -
19:03 - 19:04So this is how we do that in Python.
-
19:05 - 19:08So here's how we implement that program
that I just gave you -
19:08 - 19:12in Python. So, here's our names.
-
19:12 - 19:15It's shorter so my slide works better.
-
19:15 - 19:17Here's a variable, our iteration variable,
it's going to, you know, -
19:17 - 19:21go through all five of these one at a time.
-
19:21 - 19:25And within the body of the
loop we have an if statement. -
19:25 - 19:27If the name is not currently in the
-
19:27 - 19:31counts dictionary, counts is the name of
my dictionary. -
19:31 - 19:34If the name is not currently in the
counts dictionary, -
19:34 - 19:35I say counts sub name equals one.
-
19:36 - 19:40else, that must mean it's already there
which means -
19:40 - 19:43it's okay to retrieve it, counts sub name
plus 1. -
19:43 - 19:47We're going to add a 1 to it and stick it
back in, okay? -
19:47 - 19:49And so when this finishes it's going to
add -
19:49 - 19:53entries and then add one to entries that
already exist. -
19:53 - 19:57And not traceback at all. And when we
print it out we're going to see the counts. -
19:57 - 19:59And literally this could have gone
-
19:59 - 20:02a million times and it would just be fine
and it would just keep expanding. -
20:02 - 20:03Okay?
-
20:05 - 20:07So this pattern of checking to see if a key
-
20:07 - 20:11is in a dictionary, setting it to some
number, or -
20:12 - 20:15adding one to it is a really, really common
pattern. -
20:16 - 20:20It's so common, as a matter of fact, that
there is a -
20:20 - 20:25a special thing built into dictionaries
that does this for us, okay? -
20:25 - 20:27And there is this method called get.
-
20:28 - 20:30And so, counts is the name of the
dictionary, -
20:30 - 20:34get is a built-in capability of
dictionaries. -
20:34 - 20:36And it takes two parameters.
-
20:36 - 20:43The first parameter is a key name, like a
string, like csev or chen wen or marquard. -
20:43 - 20:51And then the second parameter is a value
to give back if this doesn't exist. -
20:51 - 20:54It's a default value if the key does not
exist. -
20:54 - 20:56And there's no traceback.
-
20:56 - 21:01So this way you can encapsulate, in effect,
an if-then-else. -
21:01 - 21:06If the name parameter is in the counts,
print the thing out, otherwise print zero. -
21:06 - 21:11So this expression will either get the
number -
21:11 - 21:17if it exists or it will give me back a
zero if it doesn't exist. -
21:17 - 21:19So this is really valuable.
-
21:19 - 21:21Right? This is really valuable.
-
21:21 - 21:23That's a really bad smiley face.
-
21:23 - 21:29So this is really valuable because it,
once, once we understand the idiom, -
21:29 - 21:33it really takes four lines of code and
turns it into one line of code. -
21:33 - 21:35Because we're going to be doing this
if-then-else all the time. -
21:36 - 21:39Now, and so we can reconstruct that loop
-
21:39 - 21:44a lot easier and a lot more cleanly using this
idiom, right? -
21:44 - 21:46It's something that looks kind of complex
but you'll -
21:46 - 21:49get used to it really fast, okay?
-
21:49 - 21:52So we have, everything here is the same,
-
21:52 - 21:54we create an empty dictionary, we have five
names to -
21:54 - 21:56go through, we're going to write a
for loop -
21:56 - 21:58and it's going to go through each of
those. -
21:58 - 22:05And then we're going to say counts sub name
equals counts dot get the value stored -
22:05 - 22:08at name, and if you don't find it, give me
back a zero. -
22:08 - 22:12And then whatever comes back, either the
old value or -
22:12 - 22:17the zero, add 1 to that and then take that
sum and stick it in counts name. -
22:18 - 22:20Okay? So this is either
-
22:22 - 22:23going to create,
-
22:26 - 22:30or it's going to update.
-
22:30 - 22:33If there is no entry, it's going to create
it and set it to one. -
22:33 - 22:37If there is an entry it's going to add one to
the current entry. -
22:38 - 22:39Okay? So this is,
-
22:43 - 22:45this line is kind of an idiom.
-
22:47 - 22:48Read about it in the book, figure it out,
-
22:48 - 22:50get used to the notion of what this is doing.
-
22:50 - 22:53Understand what that is doing, okay?
-
22:54 - 22:57Because I'm going to start using it as if
you understand it. -
22:58 - 23:05So, the next problem is a problem of
finding the most common word. -
23:05 - 23:08So, finding the most common, the top
-
23:08 - 23:12five, is often a, a trigger that says, use
-
23:12 - 23:14dictionaries because if you're going to
have to count things up, -
23:14 - 23:16you're going to, you know, you don't
-
23:16 - 23:18know what the most common thing is at the
beginning. -
23:18 - 23:22First you have to count everything up, and
dictionaries are a great way to count. -
23:22 - 23:25So here's a little problem and I would
like you to read -
23:25 - 23:29this text and find me the most common word
in the text. -
23:29 - 23:33And tell me what the most common word is
and how many times -
23:35 - 23:37it occurs. Ready?
-
23:37 - 23:40I'm going to give you a thousandth of a
second, just like I would give a computer. -
23:40 - 23:42I would expect it'd be able to do this in
a thousandth of a second. -
23:42 - 23:43[SOUND] There you go.
-
23:43 - 23:46[BLANK_AUDIO]
-
23:46 - 23:48Okay, I gave you five seconds.
Time's up. -
23:48 - 23:49Did you get it?
-
23:50 - 23:53Or did you say to yourself, you know what,
I hate -
23:53 - 23:56that, it's no good, I think I'll write a
Python program instead. -
23:56 - 23:59And he'll probably show me a Python
program if I wait long enough. -
23:59 - 24:03So here's a slightly easier problem from
the first lecture. -
24:03 - 24:04Ready?
-
24:04 - 24:05It's the same problem.
-
24:05 - 24:08Find the most common word and how many
times the word occurs. -
24:08 - 24:12[BLANK AUDIO]
-
24:12 - 24:34[MUSIC]
-
24:35 - 24:40Did you get it?
I believe the answer is, and I could look -
24:40 - 24:46really dumb here, oops, the answer is the,
and I think it's seven times. -
24:46 - 24:48So, that's the right answer. Okay?
-
24:48 - 24:50Again, things humans are not so good at.
-
24:51 - 24:55So, here's a piece of code that's starting
to combine some -
24:55 - 24:58of the things we've been doing in the past
few chapters all together. -
24:58 - 25:01We are going to read a line of text,
-
25:01 - 25:06split it into words, count the occurrence,
how many times -
25:06 - 25:10each word occurs, and then print out a map.
-
25:10 - 25:15So, so here's what we're going to do,
we're going to say okay, start -
25:15 - 25:19a dictionary, an empty dictionary, read
the line of input. -
25:20 - 25:27Then split it, remember, the split takes a
string and produces a list. -
25:27 - 25:32So words is a list, line is a string, and
then we'll print that out. -
25:32 - 25:34Then we're going to write a for loop
that's going to go -
25:34 - 25:38through each of the words, and
then create, use this idiom -
25:38 - 25:42counts sub word equals counts.get word, 0 + 1.
-
25:42 - 25:45So this is going to do exactly what we talked
about in the previous -
25:45 - 25:51couple slides back, either create the
entries or add to those entries, okay? -
25:51 - 25:52And then we're going to print
-
25:52 - 25:53them out.
-
25:53 - 25:56So here's what that program does when it
prints out. -
25:57 - 25:59Now this is actually one long line I'm
-
25:59 - 26:01just cutting it so you can see it.
-
26:01 - 26:05Here's this line we enter, and the words
the, there's seven of them. -
26:05 - 26:08Then it takes this line and splits it into a
-
26:08 - 26:11list, and there is the beginning and end
of the list. -
26:11 - 26:14The list maintains the order, so the
-
26:14 - 26:18list simply breaks all these words into
separate -
26:18 - 26:22words in a list of strings.
From one string -
26:23 - 26:29to many strings. This is many strings.
And so the, and the spaces are gone. -
26:29 - 26:31And so now here's this list.
-
26:31 - 26:34And then what we're going to do is we're
going to run through the list. -
26:35 - 26:39And we're going to keep running totals of
each of the words in the list. -
26:39 - 26:40And then when we're done with the list,
-
26:40 - 26:44we're going to print out the contents of
that dictionary. -
26:44 - 26:45And we can inspect it and
-
26:45 - 26:47go like, let's look for the biggest one,
na, na, na, na, na. -
26:47 - 26:48It's kind of like
-
26:48 - 26:51looking for the largest, like, oh,
seven. -
26:51 - 26:54That's the largest and the largest word is
the. -
26:54 - 26:55Okay?
-
26:55 - 26:59So that's how the program runs, it
reads a line, -
26:59 - 27:02splits it into a list of words, and then
-
27:02 - 27:05accumulates a running total for each word,
and then we -
27:05 - 27:09hand inspect to see what the most common
word is. -
27:09 - 27:09Okay?
-
27:11 - 27:13Oh no, no, I don't want that song again.
-
27:13 - 27:14There we go.
-
27:14 - 27:18And so and so here we have the, in it's
kind of a smaller fashion. -
27:19 - 27:24We make a dictionary.
This entering a line of text is here. -
27:24 - 27:25It's all one line.
-
27:25 - 27:27We do the split and then we print the
words out. -
27:29 - 27:32And so that split creates a list of
strings from a single -
27:32 - 27:37string based on where the blanks are at,
chop, chop, chop, chop. -
27:37 - 27:38And then here
-
27:38 - 27:39at counting,
-
27:41 - 27:46we're going to loop through each of the
words one at a time and use this idiom, -
27:46 - 27:53counts sub word equals counts.get word, 0 + 1,
which is going to create and/or update. -
27:53 - 27:55And then we print the counts out and that
comes out there. -
27:56 - 27:57Okay?
-
27:58 - 28:00So, again, this is the new thing that
we've done. -
28:00 - 28:02Everything else we've kind of seen before.
-
28:05 - 28:08Now we can also loop through dictionaries
with for loops. -
28:13 - 28:15The for loop, we've been, put all kinds of
things over here. -
28:15 - 28:19We've put strings over here, we've put
lists of numbers over here. -
28:19 - 28:21We've put files over here.
-
28:21 - 28:23And basically what it really says is you
-
28:23 - 28:26know, if this is a collection of things,
-
28:26 - 28:28run this little indent code once for
each item in -
28:28 - 28:33the collection, and key then becomes our
iteration variable. -
28:33 - 28:35And key is very mnemonic here.
-
28:35 - 28:37It doesn't know that they are keys.
-
28:37 - 28:39And so, keys.
-
28:39 - 28:45The key here is that, there's a bit, the
important -
28:45 - 28:50concept here is that dictionaries are
key/value pairs and so this is -
28:50 - 28:53only one variable and so it actually
decides that, they've decided that -
28:53 - 28:56it goes through the keys, which is
actually quite useful. -
28:56 - 29:01So key is going to take on the successive
values of the labels. -
29:01 - 29:02Not the successive values of
-
29:02 - 29:04the values stored at the labels.
-
29:04 - 29:10But it's really easy for us to retrieve
the contents at that label counts sub key. -
29:10 - 29:15So we're going to use the key 'chuck',
'fred', 'jan', to look up the 1, 42, 100. -
29:15 - 29:18And so it prints out the key,
-
29:18 - 29:22and then the value at it, the key, and the
value at it, and the key, and the value. -
29:22 - 29:25And so we're able to sort of go through
-
29:25 - 29:27the dictionary and look at all the
key/value pairs, -
29:27 - 29:30which is the common thing that you really
want to do. -
29:31 - 29:32Okay?
-
29:35 - 29:38Now there's some methods inside of
dictionaries that allow -
29:38 - 29:43us to convert dictionaries into lists
of things. -
29:43 - 29:47And so if you simply take a dictionary, so
here's a little dictionary with -
29:47 - 29:52three items in it, and we can say list sub
and then give a dictionary name -
29:52 - 29:54right there, and then that converts it
into a -
29:54 - 29:57list. But it's just a list of the keys.
-
29:58 - 30:01We can also say jjj dot keys, kind of do
the same thing. -
30:01 - 30:05Say give me a list consisting of the keys.
-
30:05 - 30:10And then jjj dot values gives you a list
of the values, 1, 42, and 100. -
30:10 - 30:13Of course they're not in the same order.
-
30:13 - 30:16Now interestingly, as long as you don't
modify the dictionary, -
30:16 - 30:20the order of these two things corresponds
as long as -
30:20 - 30:23in between here you're not changing it.
So the first jan maps to 100, -
30:23 - 30:25chuck maps to 1,
-
30:25 - 30:28and fred maps to 42.
-
30:28 - 30:30So the order, you can't predict the order
they're -
30:30 - 30:32going to come out but these two things
will -
30:32 - 30:35come out in the same order, whatever that
order -
30:35 - 30:38happens to be. Okay, and so there's one
more thing. -
30:39 - 30:44So we've got the keys, we've got the
values, and we've got a thing called items. -
30:44 - 30:50items also returns a list, it's a list.
But it's a list of -
30:50 - 30:55what Python calls tuples.
That's what the next chapter is about. -
30:55 - 30:57We'll talk more about tuples in the next
chapter. -
30:58 - 31:01A tuple is a key/value pair.
-
31:01 - 31:06So this list has three things in it.
One, two, three. -
31:06 - 31:10The first one jan maps to 100, the
second is chuck maps to 1, the -
31:10 - 31:16third one is fred maps to 42. So,
just kind of bear with me for a second. -
31:16 - 31:18We'll hit this a little harder in the next
chapter. -
31:19 - 31:21But the place that this, the idiom where
-
31:21 - 31:24this works very beautifully is on a for
loop. -
31:24 - 31:27Now, for those of you who have programmed
in other languages, this will be -
31:27 - 31:30kind of weird because other languages have
-
31:30 - 31:34iterations but they don't have two
iteration variables. -
31:34 - 31:36Python has two iteration variables.
-
31:36 - 31:37It can be used for many things but one of the
-
31:37 - 31:41things that it's used for that's really
quite nice is -
31:41 - 31:46we can have two iteration variables.
This jj items returns pairs of -
31:46 - 31:51things and then aaa and bbb are iteration
variables that sort of -
31:51 - 31:57move in synchronized, move, are
synchronized as they move through. -
31:57 - 32:01So aaa takes on the value of the key.
-
32:01 - 32:06bbb takes on the value of the, the
value. -
32:06 - 32:09And then the loop runs once.
-
32:09 - 32:13Then aaa is advanced to the next key.
-
32:13 - 32:17And bbb is advanced to the next value
simultaneously, synchronized. -
32:17 - 32:20Then they print that out, then it advances
to the -
32:20 - 32:23next one, and the next one, and they print
that out. -
32:23 - 32:27So they are moving in a synchronized way.
-
32:27 - 32:31Now again, the order jan, chuck, fred is not
the same. -
32:31 - 32:33But the correspondence between jan 100,
-
32:33 - 32:37chuck 1, and fred,
that's going to, that's going to work. -
32:37 - 32:41And so basically, as these things go, they
work -
32:41 - 32:44their way through whatever order they're
stored in the dictionary. -
32:44 - 32:45So this is quite nice.
-
32:45 - 32:49Two iteration variables going through
key/value. -
32:49 - 32:54Now if I was making these names mnemonic,
and they made more sense, -
32:54 - 32:57I would call this the key variable and
that would be the value variable. -
32:58 - 33:01But for now I'm just using kind of silly
names -
33:01 - 33:03to show you that key and value are not
special. -
33:03 - 33:06They're not Python reserved words in any
way. -
33:06 - 33:09They're just a good way to name these
things, key/value pairs. -
33:09 - 33:10Okay?
-
33:12 - 33:13Okay.
-
33:13 - 33:17Now we're going to circle all the way
back to the beginning. -
33:17 - 33:18All the way back to the first lecture.
-
33:18 - 33:24And I gave you this program, and I said
don't worry about it. -
33:24 - 33:28We'll learn about it later.
Well, now later. -
33:28 - 33:32At this point you should be able to
understand every line of this program. -
33:33 - 33:38This is the program that's going to count
the most common word in a file. -
33:38 - 33:39Okay?
-
33:39 - 33:41So let's walk through what it does and
hopefully -
33:41 - 33:45by now this will make a lot of sense.
-
33:46 - 33:48Okay? So we're going to start out, we're
going to ask -
33:48 - 33:51for a file name, we're going to open that
file for read. -
33:52 - 33:55Then, because we know it's not a very large
-
33:55 - 33:57file, we're going to read it all in one go.
-
33:57 - 34:00So handle dot read says read the whole
file, newlines and all, -
34:00 - 34:04blanks, newlines, whatever,
and put it in -
34:04 - 34:08the variable called text, it's just
mnemonic. Remember I'm, in this one -
34:08 - 34:13I'm using the mnemonic variable names.
Then go through that whole -
34:13 - 34:17string, which was the whole file, go
through and split it all. -
34:17 - 34:20Newlines don't hurt it.
Newlines are treated like blanks. -
34:20 - 34:22And it understands all that.
-
34:22 - 34:23It throws the newlines away and the
blanks away -
34:23 - 34:27and splits it into a beautiful list of
just words with no blanks. -
34:29 - 34:33And the list of the words in that file
ends up in the variable words. -
34:33 - 34:36words is a list, text is a string, words
is a list. -
34:37 - 34:42Then what I do is the pattern of
accumulating counters in a dictionary. -
34:42 - 34:44I make an empty dictionary.
-
34:44 - 34:48I have the word variable that goes through
all the words -
34:48 - 34:53and then I just say, counts sub word equals
counts dot get(word,0) + 1, -
34:54 - 34:57and that, like we just got done saying,
it both creates -
34:57 - 35:02and/or increments the value in the
dictionary as needed. -
35:02 - 35:04So now at the end of the, at the, at this
-
35:04 - 35:12point in the program, we have a full
dictionary with the word:count. -
35:12 - 35:12Okay?
-
35:12 - 35:15And there's many of them.
You know, all the words, all the counts. -
35:15 - 35:17They're not in any particular order. So now what
-
35:17 - 35:22we're going to do is we're going to write
a largest loop, find the largest. -
35:22 - 35:24Which is another thing that we've done.
-
35:24 - 35:27So not only do I need to now know what
largest count I've seen so far, -
35:27 - 35:30I need to know what word that is.
-
35:30 - 35:33So I'm going to set the largest count
we've seen so far to None, set -
35:33 - 35:37the largest word we've seen so far
to None, and then I'm going to use this -
35:37 - 35:39two-iteration variable pattern to say
-
35:39 - 35:44go through the key/value pairs word and
count in counts.items. -
35:44 - 35:45So it's just going to
-
35:45 - 35:47go through [SOUND] all of them.
-
35:47 - 35:53And then I'm going to ask if the largest
number I've seen so far is None or -
35:53 - 35:56the current count that I'm looking at is
greater then the largest I've seen so far, -
35:59 - 36:03keep them. Take the current word, stick it
in biggest word so far, -
36:03 - 36:07take the current count, stick it in
the biggest count so far. -
36:07 - 36:10So this is going run through all of the
-
36:10 - 36:14word.count pairs, word.count key/value pairs.
-
36:14 - 36:17And then when it comes out, it's going to
print out -
36:17 - 36:19the word that's the most common and how
many times. -
36:21 - 36:24So if we feed in that clown text, it will
run all this stuff, and print out -
36:24 - 36:29oh, the is the most common word, and it
appeared seven times. -
36:29 - 36:34Or if I print the stuff that was two
slides back, words.txt, from the actual -
36:34 - 36:38textbook, then it says the word to is the
most common and it happened 16 times. -
36:38 - 36:43So I could easily, you know, throw 10
million, 10 million -
36:43 - 36:46words through this thing, and it would
just be totally happy. -
36:46 - 36:49Right? And so, this is not that complex
-
36:49 - 36:53of a problem, but it's using a whole bunch
of idioms that we've been using. -
36:53 - 36:57The splitting of words, the accumulation
of multiple counters in a dictionary. -
36:57 - 37:02And so, it sort of is the beginning of
doing some kind of data -
37:02 - 37:06analysis that's hard for humans to do, and
error-prone for humans to do. -
37:06 - 37:08And so this is, we're reviewing collections.
-
37:08 - 37:11We've introduced dictionaries.
-
37:11 - 37:13We've done the most common word pattern,
talked about that. -
37:13 - 37:14The lack of order, and
-
37:14 - 37:16I did that a bunch of times.
-
37:16 - 37:20And we've looked ahead at tuples,
which is the next, -
37:20 - 37:22the third kind of collection that we're
going to talk about. -
37:22 - 37:26And they're actually in some ways a little
simpler than dictionaries. -
37:26 - 37:27And simpler than lists.
-
37:27 - 37:33So, see you in the next lecture, Chapter
10, tuples.
- Title:
- Python for Informatics - Chapter 9 - Dictionaries
- Description:
-
This is Chapter 9 - Dictionaries from Python for Informatics - Exploring Information. www.pythonlearn.com
All Lectures: http://www.youtube.com/playlist?list=PLlRFEj9H3Oj4JXIwMwN1_ss1Tk8wZShEJ - Video Language:
- English
- Team:
- Captions Requested
- Duration:
- 37:34
Claude Almansi edited English subtitles for Python for Informatics - Chapter 9 - Dictionaries |