Hello again, and welcome to Chapter Nine
of Python, Dictionaries.
As always, this lecture is copyright
Creative Commons Attribution.
That means the audio, the video, the
slides, and even my scribbles.
You can use them any way you like, as long
as you attribute them.
Okay, so this is the second chapter
where we're talking about collections, and
the collections
are like a piece of luggage in that you
can put multiple things in them.
Variables that we've talked about sort of
starting in
Chapter Two and Chapter Three were simple
variables, scalar.
They're just kind of one thing, and as
soon as you,
like, put another thing in there, it
overwrites the first thing.
And so if you look at the code, you know,
x = 2 and x = 4,
the question is, you know, where did
the 2 go?
Right? The 2 was there, x was there,
there was a 2 in there, and then we cross
it out and put 4 in there.
This is sort of the basic operation, the
assignment statement, it's a replacement.
But a dictionary allows us to put more
than one thing.
Not using this syntax, but it allows us to
have a variable that's really an aggregate
of many values.
And the difference between a list and a
dictionary
is how the values are structured within
that single variable.
The list is a linear collection,
indexed by integers 0, 1, 2, 3.
If there's five of them, it's 0 through 4,
very much like a
Pringle's can here, where they're just
stacked nicely on top of each other.
Everything's kind of organized. We talked
about it in the last, in the last lecture.
This lecture we're talking about dictionaries.
A dictionary's very powerful.
It's, and its power comes from a different
way of organizing itself internally.
It's a bag of values,
like a just sort of, just stuff's
in it, it's not in any order.
Big stuff, little stuff.
Things have labels.
You can also think of it like a purse with
just things in it that's like, it's not
like stacked.
It's just, stuff moves around as you're going
and that's, that's a very good model for
dictionaries.
And so dictionaries
have to have a label because the stuff is
not in order.
There's no such thing as the third thing.
There is the thing with the label perfume.
There's the thing with the label candy.
There's the thing with the label money.
And so there's the value, the thing, the money.
And then there's always also the label.
We also call these key/value.
The key is the label and the value is
whatever.
And so these pink things are all labels for
various things you could put in your purse.
So you could say to your purse, "hey purse,
give me my tissues."
"Hey purse, give me my money."
And it, it's in there somewhere and the
purse sort of
gives you back the tissues or the money.
And it's, Python's most powerful data
collection is the dictionaries.
And it's when
you get used to wielding them you'll say,
like,
whoa, I can do so much with these things.
And at the beginning you just sort of
learning sort of how to use them without
hurting yourself.
But they're very powerful.
It's like a database.
It's, it allows you to store very arbitrary
data organized in however you feel like
organizing it, in a way that advances the
cause of the program that you're writing.
And we're still kind of at the very
beginning, but as you learn more,
these will become a very powerful
tool for you.
They, dictionaries have different names in
different languages.
PERL or PHP would call them associative
arrays.
Java would call them a PropertyMap or a
HashMap.
And C# might call them a property bag or
an attribute bag.
And so they're, they're just the same
concept.
It's keys and values is the concept that's
across all these languages.
Just are very powerful.
And if you look at the Wikipedia entry
that I have here
you can see that it's just, it's a concept
that we give different names in different
languages. Same concept, different names.
So like I said, the difference between a
list and a dictionary, they both can store
multiple values. The question is how we
label them,
how we store them, and how we retrieve
them.
So here's an example use of a dictionary.
I'm going to make a thing called purse.
And I'm going to store in purse, this is
an assignment statement,
purse sub money.
So this isn't like sub zero.
This is sub money.
So I'm actually using a string as the
place.
And, so I'm going to say stick 12
in my purse
and stick a Post-it note that says
that's my money.
Candy is 3. Tissues is 75.
And if I look at that, it's not just the
numbers 12, 3, and 75 as it
would be in a list. It is the connection
between money and 12,
tissues is 75, candy is 3.
And in the key/value, that's the
key and that's the value.
So candy is the key and 3 is the value.
Now I can look things up by their name,
print purse sub candy.
Well it goes and finds it, asking hey purse,
give me back candy, and it
goes and finds the value, which is 3, and
so out comes a 3.
We can also put it
on the right-hand side of an
assignment statement,
so purse sub candy says give me
the old version of candy,
and then add 2 to it, which
gives me 5, and then store it back
in that purse
under the label candy.
So we see candy changing to 5.
And so, this is a place, and you could
do this with a list except these would be
numbers.
You could say purse sub two is equal to
purse sub two plus two, or whatever.
But in dictionaries, there are labels.
Now, they're not strings.
Strings is a very common label in
dictionaries, but
it's not always strings, you can use other
things.
In this chapter we'll pretty much focus on
strings.
You can even use numbers and then you
would get a little confused.
But you can.
So here's sort of a picture of how this
works.
So, if we take a look at this line purse
sub money equals 12,
it's like we were putting a key/value
connection, money is the label for 12.
And then we sort of move that in.
And it's up to the purse to decide
where things live.
If we look at the next line, we're going to
put the value in with a
3 in with the label candy, and we're
going to put
the value 75 in with the label of tissues.
And when we say hey purse, print yourself
out, it just
goes and pulls these things back out and
hands them to us.
And what it's really, it's giving us both the
label and the value and it's necessary
to do that cause they're just like 12,
75, and 3. What exactly is that?
And so this syntax with the curly braces
is what happens when you print a
dictionary out.
The same thing happens when we're sort of
printing purse sub candy, right?
Purse sub candy,
it's like dear purse, go and find the candy
thing.
Look at that one, look at that one.
Oh, yep, yep, this is candy.
But what we're looking for is the value,
and so that's why 3 is coming out here.
So go look up under candy, and tell me
what's stored under candy.
These can be actually more complex things,
I'm just keeping it simple for this
lecture.
And then, when we say purse sub candy
equals purse sub candy plus 2, well it
pulls the 3 out, looking at the label
candy, then adds 3 plus 2 and makes 5,
and then it assigns it back in, and then
that says, oh, go, go place this number 5
in the purse with the label of candy,
which then replaces the 3 with a 5.
Okay?
And if we print it out, we see that the
new variable, or the new candy entry,
is now 5.
Okay?
So if we just sort of put these things
side by side, we create
them sort of both the same way and we make
an empty list, and an empty
dictionary, we call the append method
because
we're sort of just putting these things in
order. You gotta put the first one in
first. So it's not telling you where.
You kind of know that this
will be the first one, cause we're
starting with an empty one,
and this will be the second one.
We put in the values 21 and 183, and then
we print it out, and it's like okay, you gave
me the values 21 and 183, I will maintain
the order for you,
there's no keys other than their position.
The position is the key, as it were, so if
I want to to change the first one to 23,
well, I say list sub zero, which is this,
and then change that to 23.
So this is sort of used as a lookup to
find something. It can be used on either the
right-hand side or the
left-hand side of an assignment statement.
Comparing that to dictionaries, I want to
put a 21 in there
and I want to put it with the label age.
I'm going to put 182, put that in with the
label course.
So we don't have to like, make an entry.
The fact that the entry doesn't exist,
it creates the age entry and sticks 21 into it,
creates the course entry, sticks 182 into it.
We print it out and it says, oh, course
is 182 and age is 21.
This emphasizes that order is not
preserved in dictionaries.
I won't go into like great detail as to
why that is.
It turns out that that's a compromise that
makes them fast using a technique called
hashing.
It's how it actually works internally,
go Wikipedia hashing and
take a look.
But, the thing that matters to us as
programmers primarily
is that lists maintain order and
dictionaries do not maintain order.
They, dictionaries give us power
that we don't have in lists.
I mean they're very complimentary.
Now there's not this one that's better
than the other.
They've very complimentary.
Different kinds of data is either better
represented as a list
or as a dictionary, depending on the
problem you're trying to solve.
And in a moment we'll, we'll be writing
programs that are using both.
So if we come down here and I say,
okay, stick 23 into, assignment statement,
into ddd sub age,
well that will change this 21 to 23,
so when we print it out.
So you can, this part, where you look
something up and
change the value, you can do either way.
It's just how you do it here
is a little bit different, okay?
So let's look through this code again.
And so I like, I like to use the word key
and value.
Key is the way we look the thing up,
and in lists
keys are numbers starting at
zero and with no gaps.
In dictionaries keys are whatever we want
them to be,
in this case I'm using strings.
And then the value is the number we're
storing in it.
So we create this kind of a list with that
kind, those
kinds of statements.
This statement creates this kind of a thing.
Now, if we, if we think of this assignment
statement as moving data
into a new, into a place, a new item of
data into a place.
It's looking at this thing right here.
Right? It's like, that's where I want to
move it.
And so it hunts, and says, look the key up.
And that's the one that I'm going to change.
And then once it knows which it's going to
change,
then it's going to take the 23, and it's
going to put the 23 into that location.
And so that's how this changes from that
to that.
Similarly when we get down to here, we're
going to stick 23 somewhere and
this is, this expression, this lookup
expression, the index
expression ddd sub age, is where we're
going to put it.
So, we're looking here, where is that thing?
Well, that thing is this entry
in the dictionary. And so now when we're
going to store the 23,
we know where the 23 is going to go.
It's going to overwrite the 21 and so the
21 is
going to change to 23, okay? So they're
kind of similar.
There are things that work similar in them
and then there are things that work
differently in them.
We can make literals, constants, with
curly braces. And they look just like the print.
That's one nice thing about Python.
When you print something out it's showing
you how you can make a literal, and
basically you just open with a curly brace
and say chuck colon 1, fred 42, jan 100.
And we're making connections.
key/value pair, key/value pair.
We print it out and
No order. They don't maintain order.
Now they might come out in the same order,
but that's just lucky.
Right?
All the ones I've shown you so far don't
come out in the same order, which is good
to demonstrate it.
If it one time came out in the same order
that wouldn't be broken.
It's not like it doesn't want to come out
in the same order.
It's just, you don't, it's not internally
stored, and you
add an element and it may reorder them.
You can do an empty dictionary with just a
curly brace, curly brace.
So, I'm going give you another example.
And I'm going to show you a series of
names.
And I want you to figure out what the most
common name is
and how many times each name appears.
Now these are real people.
They actually work on the Sakai project.
Steven, Zhen, and Chen, and me.
So these are people that are actually
in the data that we use in this course.
Okay? And so I think I'll show you about
fifteen names
and you're to come up with a way, I'm
going to
show them to you one at a time, you need to
come up with a way to keep track of these.
Okay?
So I'll just, with no further ado I will show
you the names.
[BLANK_AUDIO]
Okay, so that's all the names.
Did you get it?
You might have to go back and do it again.
How did you solve the problem?
What kind of a data structure did you
build to solve the problem?
Or did you just say wow that's painful, I
think I will learn Python instead, in
solving that problem.
Okay?
So pause the, pause the video if you want and
write down or go back, write down what you
think the
number of the most common name is and how
many times.
Okay. Now I'll show you.
So here is the whole list.
It's all of them.
And now that we see all of them, we
use our amazing human
mind and we scan around, and look at
purpleness and, and all that stuff.
And then we go like, oh, this is a so
much easier problem when I'm looking
at the whole thing.
And I think that the most common person is
Zhen, and
I think we see Zhen, I think we see Zhen
five times.
And I think csev is one, two, three and
Chen Wen is one, two.
And Steve Marquard is one, two, three.
So the question is, what is an effective
data structure if you going to see
a million of these, what kind of data
structure would you have to produce?
Because you can't keep it in you head
even, even this number of people, you can't
even this amount of data, no way you can
keep it in your head. You have to come
up with some kind of a variable, as it were,
just like largest so far was the variable.
Some kind of variable that gets you to
the point where you understand what's
going on.
And so this is the most common technique
to solve this
problem where you keep a running total of
each of the names.
And if you see a new name, you add them to
the list.
So csev and then you give him a one,
and then you see Zhen and you give her a
one,
and then you see Chen and you give her a
one.
And then you see csev again and you give
him a two.
And you see a two, and a two, and a one
right?
[COUGH]
And so then when you're all done you have
the mapping, right, of these things
and you go oh, okay, let me look through
here and find the largest one.
That's the largest one and so that must be
the person who is the most.
So you need a scratch area,
a data structure or a piece of paper as
it were,
and so that's what, exactly what
dictionaries are really good at.
You could think of this as like a
histogram. You know, it's,
it's a bunch of counters, but counters
that are indexed by a string.
So we use a lot of this.
And so this is a pattern of many counters
with a dictionary, simultaneous counters.
We're counting a bunch of, we're looking
at a series of things, and we're going to
simultaneously keep track
of a large number of counters, rather than
just one counter.
How many names did you see total? Whatever,
12. But how many of each name
did you see is a bunch of counters, so
it's a bunch of simultaneous counters.
So a dictionary is, is great for this,
a dictionary is great for this.
We, when we see somebody for the first
time, we can add an entry to the
dictionary,
which is kind of like going oh,
csev one,
and then Chen Wen one. Now these don't
exist yet.
Right? So we've got csev one and Chen
Wen one, so
that creates an entry and sticks a one in
it and the
mapping between the key csev and the value
one, the key Chen Wen
and the value one and then we say, hey
what's in there?
Oh, we've got a csev is one and
Chen Wen is one.
And then we see Chen Wen a second time,
so we'd add another number right there.
So this old number is one, we add one to
it and we get
two and then we stick that back in and
then we do the calculations.
We do a dump and say oh there's two in
Chen Wen and one in csev.
Okay?
So this is a great data structure for the
simutaneous counters like what's
the most common word, who had the most
commits, da, da, da, da, da.
Now, everything we do we have to figure
out
like, when you're going to get in trouble
with Python.
When Python's going to give you the old
thumbs down and say oh, you went too far.
So one thing Python does not like is if
you reference a key before it exists.
We'll, we'll talk in a second how to
work around this. But if you simply
create a dictionary and say, oh, print out
what's under csev, it gives you a
traceback.
It's like,
I'm going to inform you that that's not
there.
And it says key error, csev.
Now, the thing that allows us to solve
this is the in operator.
We've used the in operator to see if a
substring was in a string.
Or if a number was in a list.
So, so this in operator says, in operator
says, hey, ask a question.
Is the string csev a current key in the
dictionary ccc?
Is the string csev a current key in the
dictionary ccc?
And it says, False.
So now we have something that doesn't give
a traceback
that can tell us whether or not the key is
there.
So if you remember the algorithm, the
first time you see it, you
set them to one, and every other time, you
add one to them.
So this is how we do that in Python.
So here's how we implement that program
that I just gave you
in Python. So, here's our names.
It's shorter so my slide works better.
Here's a variable, our iteration variable,
it's going to, you know,
go through all five of these one at a time.
And within the body of the
loop we have an if statement.
If the name is not currently in the
counts dictionary, counts is the name of
my dictionary.
If the name is not currently in the
counts dictionary,
I say counts sub name equals one.
else, that must mean it's already there
which means
it's okay to retrieve it, counts sub name
plus 1.
We're going to add a 1 to it and stick it
back in, okay?
And so when this finishes it's going to
add
entries and then add one to entries that
already exist.
And not traceback at all. And when we
print it out we're going to see the counts.
And literally this could have gone
a million times and it would just be fine
and it would just keep expanding.
Okay?
So this pattern of checking to see if a key
is in a dictionary, setting it to some
number, or
adding one to it is a really, really common
pattern.
It's so common, as a matter of fact, that
there is a
a special thing built into dictionaries
that does this for us, okay?
And there is this method called get.
And so, counts is the name of the
dictionary,
get is a built-in capability of
dictionaries.
And it takes two parameters.
The first parameter is a key name, like a
string, like csev or chen wen or marquard.
And then the second parameter is a value
to give back if this doesn't exist.
It's a default value if the key does not
exist.
And there's no traceback.
So this way you can encapsulate, in effect,
an if-then-else.
If the name parameter is in the counts,
print the thing out, otherwise print zero.
So this expression will either get the
number
if it exists or it will give me back a
zero if it doesn't exist.
So this is really valuable.
Right? This is really valuable.
That's a really bad smiley face.
So this is really valuable because it,
once, once we understand the idiom,
it really takes four lines of code and
turns it into one line of code.
Because we're going to be doing this
if-then-else all the time.
Now, and so we can reconstruct that loop
a lot easier and a lot more cleanly using this
idiom, right?
It's something that looks kind of complex
but you'll
get used to it really fast, okay?
So we have, everything here is the same,
we create an empty dictionary, we have five
names to
go through, we're going to write a
for loop
and it's going to go through each of
those.
And then we're going to say counts sub name
equals counts dot get the value stored
at name, and if you don't find it, give me
back a zero.
And then whatever comes back, either the
old value or
the zero, add 1 to that and then take that
sum and stick it in counts name.
Okay? So this is either
going to create,
or it's going to update.
If there is no entry, it's going to create
it and set it to one.
If there is an entry it's going to add one to
the current entry.
Okay? So this is,
this line is kind of an idiom.
Read about it in the book, figure it out,
get used to the notion of what this is doing.
Understand what that is doing, okay?
Because I'm going to start using it as if
you understand it.
So, the next problem is a problem of
finding the most common word.
So, finding the most common, the top
five, is often a, a trigger that says, use
dictionaries because if you're going to
have to count things up,
you're going to, you know, you don't
know what the most common thing is at the
beginning.
First you have to count everything up, and
dictionaries are a great way to count.
So here's a little problem and I would
like you to read
this text and find me the most common word
in the text.
And tell me what the most common word is
and how many times
it occurs. Ready?
I'm going to give you a thousandth of a
second, just like I would give a computer.
I would expect it'd be able to do this in
a thousandth of a second.
[SOUND] There you go.
[BLANK_AUDIO]
Okay, I gave you five seconds.
Time's up.
Did you get it?
Or did you say to yourself, you know what,
I hate
that, it's no good, I think I'll write a
Python program instead.
And he'll probably show me a Python
program if I wait long enough.
So here's a slightly easier problem from
the first lecture.
Ready?
It's the same problem.
Find the most common word and how many
times the word occurs.
[BLANK AUDIO]
[MUSIC]
Did you get it?
I believe the answer is, and I could look
really dumb here, oops, the answer is the,
and I think it's seven times.
So, that's the right answer. Okay?
Again, things humans are not so good at.
So, here's a piece of code that's starting
to combine some
of the things we've been doing in the past
few chapters all together.
We are going to read a line of text,
split it into words, count the occurrence,
how many times
each word occurs, and then print out a map.
So, so here's what we're going to do,
we're going to say okay, start
a dictionary, an empty dictionary, read
the line of input.
Then split it, remember, the split takes a
string and produces a list.
So words is a list, line is a string, and
then we'll print that out.
Then we're going to write a for loop
that's going to go
through each of the words, and
then create, use this idiom
counts sub word equals counts.get word, 0 + 1.
So this is going to do exactly what we talked
about in the previous
couple slides back, either create the
entries or add to those entries, okay?
And then we're going to print
them out.
So here's what that program does when it
prints out.
Now this is actually one long line I'm
just cutting it so you can see it.
Here's this line we enter, and the words
the, there's seven of them.
Then it takes this line and splits it into a
list, and there is the beginning and end
of the list.
The list maintains the order, so the
list simply breaks all these words into
separate
words in a list of strings.
From one string
to many strings. This is many strings.
And so the, and the spaces are gone.
And so now here's this list.
And then what we're going to do is we're
going to run through the list.
And we're going to keep running totals of
each of the words in the list.
And then when we're done with the list,
we're going to print out the contents of
that dictionary.
And we can inspect it and
go like, let's look for the biggest one,
na, na, na, na, na.
It's kind of like
looking for the largest, like, oh,
seven.
That's the largest and the largest word is
the.
Okay?
So that's how the program runs, it
reads a line,
splits it into a list of words, and then
accumulates a running total for each word,
and then we
hand inspect to see what the most common
word is.
Okay?
Oh no, no, I don't want that song again.
There we go.
And so and so here we have the, in it's
kind of a smaller fashion.
We make a dictionary.
This entering a line of text is here.
It's all one line.
We do the split and then we print the
words out.
And so that split creates a list of
strings from a single
string based on where the blanks are at,
chop, chop, chop, chop.
And then here
at counting,
we're going to loop through each of the
words one at a time and use this idiom,
counts sub word equals counts.get word, 0 + 1,
which is going to create and/or update.
And then we print the counts out and that
comes out there.
Okay?
So, again, this is the new thing that
we've done.
Everything else we've kind of seen before.
Now we can also loop through dictionaries
with for loops.
The for loop, we've been, put all kinds of
things over here.
We've put strings over here, we've put
lists of numbers over here.
We've put files over here.
And basically what it really says is you
know, if this is a collection of things,
run this little indent code once for
each item in
the collection, and key then becomes our
iteration variable.
And key is very mnemonic here.
It doesn't know that they are keys.
And so, keys.
The key here is that, there's a bit, the
important
concept here is that dictionaries are
key/value pairs and so this is
only one variable and so it actually
decides that, they've decided that
it goes through the keys, which is
actually quite useful.
So key is going to take on the successive
values of the labels.
Not the successive values of
the values stored at the labels.
But it's really easy for us to retrieve
the contents at that label counts sub key.
So we're going to use the key 'chuck',
'fred', 'jan', to look up the 1, 42, 100.
And so it prints out the key,
and then the value at it, the key, and the
value at it, and the key, and the value.
And so we're able to sort of go through
the dictionary and look at all the
key/value pairs,
which is the common thing that you really
want to do.
Okay?
Now there's some methods inside of
dictionaries that allow
us to convert dictionaries into lists
of things.
And so if you simply take a dictionary, so
here's a little dictionary with
three items in it, and we can say list sub
and then give a dictionary name
right there, and then that converts it
into a
list. But it's just a list of the keys.
We can also say jjj dot keys, kind of do
the same thing.
Say give me a list consisting of the keys.
And then jjj dot values gives you a list
of the values, 1, 42, and 100.
Of course they're not in the same order.
Now interestingly, as long as you don't
modify the dictionary,
the order of these two things corresponds
as long as
in between here you're not changing it.
So the first jan maps to 100,
chuck maps to 1,
and fred maps to 42.
So the order, you can't predict the order
they're
going to come out but these two things
will
come out in the same order, whatever that
order
happens to be. Okay, and so there's one
more thing.
So we've got the keys, we've got the
values, and we've got a thing called items.
items also returns a list, it's a list.
But it's a list of
what Python calls tuples.
That's what the next chapter is about.
We'll talk more about tuples in the next
chapter.
A tuple is a key/value pair.
So this list has three things in it.
One, two, three.
The first one jan maps to 100, the
second is chuck maps to 1, the
third one is fred maps to 42. So,
just kind of bear with me for a second.
We'll hit this a little harder in the next
chapter.
But the place that this, the idiom where
this works very beautifully is on a for
loop.
Now, for those of you who have programmed
in other languages, this will be
kind of weird because other languages have
iterations but they don't have two
iteration variables.
Python has two iteration variables.
It can be used for many things but one of the
things that it's used for that's really
quite nice is
we can have two iteration variables.
This jj items returns pairs of
things and then aaa and bbb are iteration
variables that sort of
move in synchronized, move, are
synchronized as they move through.
So aaa takes on the value of the key.
bbb takes on the value of the, the
value.
And then the loop runs once.
Then aaa is advanced to the next key.
And bbb is advanced to the next value
simultaneously, synchronized.
Then they print that out, then it advances
to the
next one, and the next one, and they print
that out.
So they are moving in a synchronized way.
Now again, the order jan, chuck, fred is not
the same.
But the correspondence between jan 100,
chuck 1, and fred,
that's going to, that's going to work.
And so basically, as these things go, they
work
their way through whatever order they're
stored in the dictionary.
So this is quite nice.
Two iteration variables going through
key/value.
Now if I was making these names mnemonic,
and they made more sense,
I would call this the key variable and
that would be the value variable.
But for now I'm just using kind of silly
names
to show you that key and value are not
special.
They're not Python reserved words in any
way.
They're just a good way to name these
things, key/value pairs.
Okay?
Okay.
Now we're going to circle all the way
back to the beginning.
All the way back to the first lecture.
And I gave you this program, and I said
don't worry about it.
We'll learn about it later.
Well, now later.
At this point you should be able to
understand every line of this program.
This is the program that's going to count
the most common word in a file.
Okay?
So let's walk through what it does and
hopefully
by now this will make a lot of sense.
Okay? So we're going to start out, we're
going to ask
for a file name, we're going to open that
file for read.
Then, because we know it's not a very large
file, we're going to read it all in one go.
So handle dot read says read the whole
file, newlines and all,
blanks, newlines, whatever,
and put it in
the variable called text, it's just
mnemonic. Remember I'm, in this one
I'm using the mnemonic variable names.
Then go through that whole
string, which was the whole file, go
through and split it all.
Newlines don't hurt it.
Newlines are treated like blanks.
And it understands all that.
It throws the newlines away and the
blanks away
and splits it into a beautiful list of
just words with no blanks.
And the list of the words in that file
ends up in the variable words.
words is a list, text is a string, words
is a list.
Then what I do is the pattern of
accumulating counters in a dictionary.
I make an empty dictionary.
I have the word variable that goes through
all the words
and then I just say, counts sub word equals
counts dot get(word,0) + 1,
and that, like we just got done saying,
it both creates
and/or increments the value in the
dictionary as needed.
So now at the end of the, at the, at this
point in the program, we have a full
dictionary with the word:count.
Okay?
And there's many of them.
You know, all the words, all the counts.
They're not in any particular order. So now what
we're going to do is we're going to write
a largest loop, find the largest.
Which is another thing that we've done.
So not only do I need to now know what
largest count I've seen so far,
I need to know what word that is.
So I'm going to set the largest count
we've seen so far to None, set
the largest word we've seen so far
to None, and then I'm going to use this
two-iteration variable pattern to say
go through the key/value pairs word and
count in counts.items.
So it's just going to
go through [SOUND] all of them.
And then I'm going to ask if the largest
number I've seen so far is None or
the current count that I'm looking at is
greater then the largest I've seen so far,
keep them. Take the current word, stick it
in biggest word so far,
take the current count, stick it in
the biggest count so far.
So this is going run through all of the
word.count pairs, word.count key/value pairs.
And then when it comes out, it's going to
print out
the word that's the most common and how
many times.
So if we feed in that clown text, it will
run all this stuff, and print out
oh, the is the most common word, and it
appeared seven times.
Or if I print the stuff that was two
slides back, words.txt, from the actual
textbook, then it says the word to is the
most common and it happened 16 times.
So I could easily, you know, throw 10
million, 10 million
words through this thing, and it would
just be totally happy.
Right? And so, this is not that complex
of a problem, but it's using a whole bunch
of idioms that we've been using.
The splitting of words, the accumulation
of multiple counters in a dictionary.
And so, it sort of is the beginning of
doing some kind of data
analysis that's hard for humans to do, and
error-prone for humans to do.
And so this is, we're reviewing collections.
We've introduced dictionaries.
We've done the most common word pattern,
talked about that.
The lack of order, and
I did that a bunch of times.
And we've looked ahead at tuples,
which is the next,
the third kind of collection that we're
going to talk about.
And they're actually in some ways a little
simpler than dictionaries.
And simpler than lists.
So, see you in the next lecture, Chapter
10, tuples.