Hello, and welcome to Chapter Eight:
Python Lists.
So now we're sort of going to start taking
care of business.
We are doing, make lists and
dictionaries and tuples and really start
manipulating this data,
and doing real data analysis,
starting the,
laying the proper work for real data
analysis.
As always, these lectures, audio, video,
slides,
and even book are copyright Creative Commons
Attribution.
So, lists, dictionaries, and tuples, the
next real three big topics we're going to
talk about, are collections.
And we've been doing lists already, right?
We've been doing lists when we were doing
for loops.
A list in Python is something that has a
square braces.
This is a constant list.
Now, when I first talked to you
about variables, I sort of oversimplified
things.
I said
if you put like x equals two, and then put
x equals four, the two and the four
overwrite each other.
A collection is where you can put a bunch
of things in the same variable.
Now, I have to have a way to find those
things.
But it allows us to put multiple things in
more, more things, more than one thing in
the variable.
So, here we have friends, that has three
strings, Joseph, Glenn, and Sally.
And we have carryon
that has socks, shirt, and perfume.
So that's the basic idea.
So what's not a collection?
Well, simple variables.
Simple variables are not collections, just
like this example.
I say x equals 2, x equals 4, and print x,
and the 4's in there and the 2 is somehow
gone.
It was there for a moment, and then it's
gone.
And so that's a normal variable.
They're not collections.
You can't put more than one thing in it.
But when you put more than one thing in
it, then you
have to have a way to find the things that
are in there.
We'll, we'll get to that.
So, we've been using list constants for
the last couple
of chapters just because we have to use
list constants.
You know, so we used, in the for loop
chapter, we did lists of numbers.
We have done lists of strings, that's
strings, red, yellow, and blue.
And you don't have to necessarily, you
don't necessarily
have to have things all of the same type.
This is a three-item list, that has
a string red,
the number integer 24, and 98.6, which is
a floating point number.
And here's an interesting thing, just as a
side note.
This shows that floating point numbers are
not always perfectly represented inside of
the computer.
It's sort of an artifact of how they work.
And this is an example of 98.6 is really
98 point
na, na, na, na, na.
So, but, don't, when you see something
like that, don't freak out.
Floating point numbers are the ones that
show this behavior.
So, interestingly, you can always,
although we won't put a lot of energy into
this, you can also have an element of a
list be a list itself.
So this a outer list that's got three
elements.
1, 7, and then
a list that's 5 and 6.
So, if you look at the length of this,
there is three things in it.
Not four, three.
Because the outer list has 1, 2, 3 things
in it.
And an empty list is bracket, bracket.
Okay?
Like I said, we have been going through
lists all along.
We have iteration variables for i in.
This is a list.
We've been using it all along.
Similarly, we've been using lists in
definite loops, are a
great way to go through lists, for friend
in friends, there we have
goes through three times, out come
three lines, with the
variable friend advancing through the
three successive items in the list.
And away we go.
So, again, lists are not completely
foreign to us.
Now,
just like in a string, we can use the
index operator,
the square bracket operator, and
we can look up items in the list.
Sub one, friends, sub one.
Not surprisingly, using the European
elevator rule,
the first item in a list is sub zero,
the second
item is sub one and the third one is sub
two.
So here when I print friends sub one I
get Glenn.
Which is the second element.
Just like strings.
So once you kind of know it for strings,
lists
and the rest of these things make a lot
more sense.
Just, remember that we're in Europe, and
things start with zero.
Some things in these data items that we
work with are not mutable.
So for example, strings, when we ask for a
lower case
version of a string, we're given a copy of
that string.
And that's because strings are not
mutable, and we can see this
by doing something like saying fruit
sub 0 equals lowercase b.
Now you'd think that that would just
change this
to be a lower case b, but it doesn't,
okay?
It says string object does not support
item assignment
which means that you're not allowed to
reassign.
You can make a new string and put
different things in
that new string, but once the strings are
made, they're not changeable.
And that's why when we call fruit.lower, we
get a copy of it in lower case.
And so x is a copy of the original
string, but
the original string, once we assign it
into fruit, is unchanged.
It can't be changed.
Lists, on the other hand, can be changed,
and we
can change them in the middle.
This is one of the things we like about
them.
So here we have a list: 2, 14, 26, 41, and
63.
Then we say lotto sub two.
Of course, that's going to be the third
item.
Lotto sub two is equal to 28.
Then we print it and we see the new number
there.
So all this is saying is that we can
change them, right?
Strings no, and lists yes.
You can change lists, but you can't change
strings.
So the len function, we've used it for
several
things, we can say you know, use, len is
used for, for strings and it's used for
lists as well.
So the same function knows
when its
parameter is a string. And when its
parameter is a string,
it gives us the number of characters
in the string.
And when it is a list, it gives us
the number of elements in the list.
And just because one of them is a string,
it's still one element from the point
of view of this list.
So it has one, two, three, four - four
items in the list, okay?
So, the range function is a special
function.
It's probably about time to talk about the
range function.
The range function is a function that
generates a list, that
produces a list and gives it back to us.
And so you give the range function a
parameter, how many items you want, and
the range
function creates and gives us back a list
that
is four numbers starting at zero, which is
zero
up to, but not including three.
Sound familiar?
Yeah.
Zero up to but not, I mean zero up to, but
not including four.
And, and so the same thing is true here.
So, we can combine the len and the range
to say, you know, to say okay, well len
friends, that's three
items, and range len friends is 0, 1, 2.
And it also
corresponds exactly to these items.
So we can actually use this
to construct loops to go through a list.
We already have a basic for loop, right?
We basically have a for loop that is our,
that, that said that for each friend in
friends.
And out comes, Happy New Year, Glenn and
Joseph.
If we also want to know where, what
position we're at as
the loop progresses, we can rewrite the
exact same loop a different way.
And make i be our iteration variable.
And say i in range(len(friends)), that
turns this into zero, one, two.
And then i goes zero, one, two.
And then, we can in the loop, look up the
particular friend that is the particular
one we are interested in,
using the index operator, friend sub i.
And then print Happy New Year.
So these two loops,
these two loops are equivalent.
These, oop, not that one.
[SOUND] This loop and this loop.
This loop is
preferred, unless you happen to need this
value i, which tells you where you're at.
In case maybe you're going to change
something, you're
going to look through something and then
change it.
So, but, but, for what I've written here,
they're exactly equivalent.
Prefer the simpler one, unless you need
the more complex one.
They both produce the same kind of output.
We can concatenate lists, much like we
concatenate strings, with plus.
And you can think of the Python operator's
looking to its right and to its left and
saying oh, those are both lists, I know
what
to do with lists, I'm going to put those
together.
And so that produces a two, three-long
lists become a six-long
list with the first one followed by
the second one concatenated.
It didn't hurt the original, a. c is a new
list, basically.
We can also slice lists.
Feels a lot like strings, right?
Everything's kind of like strings.
For loops like strings, concatenation like
strings, and now slicing like strings.
And it is exactly the same.
So one up to, but not including.
Just remember, up to, but not including.
the second parameter, is up to but not
including, so that starts at the sub one,
which is the second one up to but not
including 3, the third one, so.
This is 1, 2, and 3 so that's 41 comma 2.
Starting at the first one, up to but not
including the third one.
We can similarly eliminate the first one,
so that's up to but not including the fourth
one.
Starting at zero, one, two, three, but not
including four.
So that's this one.
If we go three to the end, and again,
remember that there, starting at 0, so 3
to the end is 0, 1, 2, 3 to the end.
The number 3 doesn't matter.
So that's 3, 74, 15.
And the
whole thing, that's the whole thing, so
these two things are the same.
So slicing works like strings, starting
and up
to but not including is the second
parameter.
There are some methods, and you can
read about these online in the Python
documentation.
We can use the built-in function.
It doesn't have a lot of use in sort of how
we run, when we're running programs but
it's kind of of useful.
I like it when I'm typing
interactively. Like, what can this thing do?
So I make a list, list is a unique type, and
I say, with dir I say what can we do with it?
Well, we can append, we can count, extend,
index, insert, pop, remove, reverse
and sort. And then you can sort of read up
on all these things.
I'll show you just a couple.
We can build a list with the append.
So this syntax here,
stuff equals list, that's called a
constructor
which says give me an empty list.
You could also say bracket, bracket for an
empty list.
Whatever, you gotta make an empty list and
then you call the append.
Remember that lists are mutable, so it's
okay to change it.
So we're saying, okay, we started with an
empty list.
Now append to the end of that, the word
book.
And then append to that, 99.
Wait a sec.
That's a mistake.
That's a mistake.
So I have to fix this mistake.
So watch me fix the mistake.
Poof.
Now my thing is magically fixed.
Isn't that amazing.
I have magic powers when it comes to slide
fixing.
I just snap my fingers and the slides are
fixed.
So here we go.
We append the 99, and we print it out.
And it's got book and 99, emphasizing the
fact that they don't
have to be the exact same kind of thing in
a list.
Then later we append cookie and then it's
book, 99, cookie.
Okay? So this append, we won't do it in line
like this so often, we'll tend to do it in
a loop as we're building up a
list, but that's the way you start with
an empty list and then [SOUND]
programmatically grow it.
We can ask, much like we do in a string,
we can ask if an item is in a list.
So here is a list called some, with these
numbers in it.
It's got five numbers in it.
Is nine in some? True, yes it is.
Is 15 in some? False.
Is 20 not in, that's a leg, a legal
syntax, that is legal syntax.
Is 20 not in some, yes it's not there,
okay?
They don't modify the list, don't modify
the list, they're just asking questions.
These are logical operations often used in
if statements or
while, some kind of a logic that you might
be building.
Okay, so lists have order.
So when we were appending them, the first
thing went
in first, the second thing went in second,
et cetera, et cetera.
And we can also tell the list to sort
itself.
So one of the things that we can do with a
list,
now we're starting to see some power here,
is say, sort yourself.
This is a list of strings.
It can sort numbers, it can sort lots of
things.
friends.sort, that says hey there, dear
friends, sort yourself.
This makes a change.
It alters the list, and puts it, in
this case, in alphabetical order, Glenn,
Joseph, and Sally.
It is muted, it was, it's, it's been
modified, and so
friend sub one is now Joseph because
that's the second one.
Okay?
So the sort method says sort yourself now,
sort yourself, and it sorts and then
it stays sorted.
So [COUGH]
you're going to be kind of ticked about
this particular slide.
Because there's a whole bunch of built-in
functions that help with lists.
And, there's max, there's min, there's
len, various things.
And so we could, all those loops that I
told you how to
do, I was just showing you that stuff
because I thought it was important.
This the simplest way to go through and
find the largest, smallest, and sum,
et cetera.
So here's a list of numbers.
We can say how many are there.
That's the count.
We can say what's the largest, it's 74.
What's the smallest, that'd be 3.
What is the sum of the running total of
them all? 154.
If you remember from a few lectures
ago, these are the same numbers.
And what is the average, which is, sum of
them over the length of them,
Okay?
So this makes a lot more sense and if you
had a list of numbers
like this, you would simply say what's the
max, you wouldn't write a max loop.
I just did that to kind of demonstrate how
loops work.
[COUGH] Demonstrate how loops work.
So here is a way that you can sort
of change those kind of programs that we
wrote.
So there's two ways to write a summing
program.
Let's just say instead of the data being
in a list, we're going to write a while
loop that's going to read a
set of numbers until we say done, and then
compute the average of those numbers.
Okay, so let's say this is our problem.
Read a list of numbers, wait till the word
done comes in, and then average them.
So here's a little program that does that.
We create total equals zero, count equals
zero.
Make a infinite loop with while True.
And then we ask
to enter a number.
We get a string back from this, remember
raw_input always
gives us strings back, and then if it's
done, we're going to break.
This is the version of the if that does
not require an indent.
We just put the break up there.
And so that gets us out of the loop when
the time is right.
So when the time is right over here.
And then, we convert the value to float.
We use a float to convert the input to a
floating point number.
And then we do our accumulation pattern,
total equals total plus value, count equals
count plus one.
So this is going to run.
These numbers are going to go up and up
and up and up.
And then we're going to break out of it,
calculate the average, and then print the
average.
Because that's a floating point number, so now
the average is a floating point number.
So that's one way to do it.
Right?
That would be one way to write a program
that does an average, is keep a running
average
as you're reading the numbers.
But there's another way to do it, that
would exact, work exactly
the same way, and this is when you can
start using lists.
So you come in, you say I'm going to
make a list
of numbers, just a mnemonic name, numlist,
is an empty list.
Then I create another infinite loop
that's going to read for enter a number.
And if it's done, break.
That gets us out of it.
Convert the value to an int.
Convert the value to a float,
the input value to a float.
And then append it to the list.
So now the list is going to grow, each
time
we read a number the list is going to
grow.
However many times we add the number is
how many things are going to be in the
list.
So in this case, when we're at this point
and we
type done, there will be three numbers in
the list, because we
will have run append three times.
We'll have appended 3, 9, and 5.
We'll have them sitting in a list.
And we will have exited the loop.
So now you say, oh add up all the numbers
in
that list, and then divide it by the
length of the list.
And print the average.
So these two programs are basically
equivalent.
The only time that they might not be
equivalent was if there was ten million
numbers.
This would use up 40 megabytes of your
memory, which
is actually not a lot of memory on some
computers.
But if memory mattered, this does store
all those numbers.
This one actually just runs the
calculation.
So if there's a really large number of
numbers, this would make a difference,
because the list is growing and keeping
them all, summing them all at the end.
This is actually storing very little data.
But for reasonably sized numbers,
like thousands or even hundreds of thousands
of numbers, these
two approaches are kind of equivalent.
And then sometimes you actually
want to accumulate something a little more
complex than this, you want to
sort them or look for the maximum and look
for something else.
Who knows what, but the notion of make a
list and then append something to the list
each time through the iteration, and then do
something with
the list at the end is a rather powerful
pattern.
So this is also a powerful pattern,
this is accumulator
pattern where we just have the variables
accumulating in the loop.
This one is one where we accumulate the
data in
the loop and then do the computations all
at the end.
The, certain situations will make use of
these different techniques.
Okay.
So, connecting strings and lists.
So there's a method, a capability
of strings that is really powerful when it
comes to tearing data apart.
It's called the split.
So here is a string
with three words and it has blanks in between
here.
And abc.split says parse this string,
look for the blanks, break the string into
pieces, and give me back a
list with one item for each of the words
in the list as
defined by the spaces. Okay?
So, it takes, breaks it into three pieces
and gives us that back in a list.
This is very powerful. Okay?
So we're going to split it and we get back
a list.
There are three words, and the first word,
stuff sub zero, is With.
So there's a lot of parsing going on here.
We could do this with for loops and a lot
of other things.
There would be a lot of work in this
split.
Given that this is a really common task,
it's really
great that this has been put into Python
for us.
Okay?
So split breaks a string into parts and
produces a list of strings.
We think of these as words, we can access a
particular word or we can loop through all
the words.
So here we have stuff again and now we
have a, a for loop
for each of the, that's going to go
through each of the three words.
And then it's going to run three times.
Now chances are good we're going to do
something different other than just print
them out.
But you see how that you quickly can take
a split followed by a for, and then write
a loop that's going to go through each of
the
words, without working too hard to find
the spaces.
You let Python do all the hard work of
finding the spaces.
Okay?
So let's take a look at a couple of
samples.
Just a couple of things to teach you a
little more about split.
Split looks at many spaces as equal to one
space.
So, if you split a lot blank, blank, blank
of spaces, it's
still just throws away all the spaces and
gives us four words.
One, two, three, four and throws away
all the spaces,
because it assumes that's what we
want done.
So that's nice.
You can also have split, you can also have
split,
split on some other character. Sometimes
you'll be getting data
and they'll have used a semicolon, or a
comma, or
a colon, or a tab character, who knows
what they've
used, and your job is to dig that data
out.
So you can split, based on the different
character.
Here, if we're splitting normally with,
with this is a normal split.
It's not going to see the semicolons, it's
looking for a space.
And so all we get back is one
item in the string, with the semicolons.
But, if we switch, and we pass semicolon
as a parameter, in as as parameter to
split,
then it will know to split it based on
semicolons, and gives us first, second, and
third back.
Okay?
And then it gives us three words.
So you can split either on spaces, or you
can split on a character other than a
space.
Okay?
[COUGH]
So, let's take a look at how we might turn
this into some of our common assignments
that we have in this chapter, where we're
going to read some of the mailbox data. Okay?
So, here we go with a little program.
First three lines, we write these a lot.
Open the file.
Write a for loop to loop through each
line in the file.
Then we're going to strip off the white
space at the end of the line.
One, two, three.
Do those all the time.
And we're looking for lines, if you look
at the whole file,
we're looking for lines that start with
from, followed by a space.
So if the line does not start with from
followed by a space, that's a space right
there, continue.
So that's a way to skip all the lines that
don't look like this.
There're thousands of lines in this file
and just a few that look like this. Okay?
So we're going to look and we're
going to try
to find what day of the week this thing
happened on.
So, so we're throwing away all the lines
with this little bit of code.
Then what we do is we take the line, which
is all of this text, and then we split it.
And we know that the day of the week is
words sub two.
So this is words sub zero, this is words sub
one, and this is words sub two.
So this is words sub zero, sub one, and sub
two.
And so, all we have to do is print out the
sub two
and we get, we throw away all the lines
except the from lines.
We split them and take the sec, uh, the,
the third word or words sub two and we
can quickly quickly create something
that's extracting
the day of the week out of these.
Okay?
So it's, it's, I mean, it's quick, because
split does the tricky work.
If you go back to the strings chapter, you
saw that
we did a lot of work to get this to
happen.
So here's even another tricky pattern.
So let's say we want to do what we did at
the end of Chapter Six,
the string chapter.
Let's say we wanted to get back this little
bit of data.
Okay?
So, can look at this and say, okay, let's
split this.
And this will be zero, one, and two, and
three, and four, and five, and six.
We're splitting it based on spaces.
Then the email address is words sub one,
right?
So that email address is this little bit
of stuff
because it's in between spaces, right?
So that's what we pull out.
The email address is words sub one.
We've got that.
So that's sitting in this email address
variable.
Then we really, all we want, we don't
really want the whole thing,
we just want the part after the
at sign, and we can do a lookup for the, oop.
We can do a lookup of the at sign.
But you can also then do a second, come
back, come back.
[SOUND] There we come.
You can also do a second split.
Okay?
So we're taking this variable here, email,
which is merely this little part right
here.
And we are splitting it again, except this
time we're splitting it based on a at
sign.
Which means it's going to bust it right
here, and find
us two pieces.
So pieces now is a list where the sub zero
item is the
person's name and sub one item is the host
that their mail address is held from.
Okay?
And so then all we need to know is pieces
is sub one, and pieces sub one is this
guy right here.
So that's pieces sub one, and so we
pulled it out.
So if you go back to how we did it before,
we were
doing searching, we were searching some
more, and then we were taking slices.
This is a little more elegant, okay?
Because really, we split it and then we
split it,
and we knew what piece we were looking at.
So this is what I call the Double Split
Pattern, where you split a string
into a list, then you take a thing out,
and then you split it again.
Depending on what data you're looking for.
This is just a technique, it's not the
only technique.
Okay, so that's lists.
We talked about the concept of a
collection where lists have multiple
things in it.
Definite loops, again, we've seen these
things.
We're kind of, it looks a lot like strings
except the elements are more powerful and
they're more mutable.
We still use the bracket operator and we
redid the max, min, and sum.
Except we did it in, like, one line rather
than a whole loop.
And something we're going to play with a
lot is using split to parse strings,
the single split, and then the double
split
is the natural extension of the single
split.
So, see you in the next lecture, looking
forward to talking about dictionaries.