Hello, and welcome to Chapter Six.
This chapter we're going to
talk about strings, and
stuff is going to start to get real now.
So, as always, this material, this video,
these
slides and book are copyright Creative
Commons Attribution.
I want you to use these materials.
I want you to, somebody else, I want to
make more teachers, so everyone can teach
this stuff.
Use it however you like.
Okay, so we've been playing with
strings from the beginning.
I mean, literally, if we didn't work
with strings, we could've never printed
Hello World.
And, and lord knows, we need to print
Hello World in a programming language.
And so, we've been using them, especially
constants.
Now, in this chapter, we're going to dig in.
So, oops, so a string is a sequence of
characters.
You can use either use single quotes or
double quotes in Python
to delimit a string.
And so here's two string constants, Hello
and there,
and stuck into the variables str1
and str2.
We can concatenate them together
with a plus sign.
Python is smart enough to look and say,
oh, those are strings, I know what to
do with those.
And you'll notice that the plus doesn't
add any space here, because when
we print bob out, Hello and there are right
next to one another.
If, for example, we've done some
conversions,
so when we were, like, reading pay,
and rate, and hours, and stuff,
we've done some conversions.
So this is an example of the,
a string 1 2 3
Not 123, but the string, quote 1 2 3
quote.
And we can't add 1 to this, we get
a traceback, kind of, at this point, as we
expected.
And we would convert that to an integer
using the int function that's built in.
See how much Python you already know?
I mean, this is awesome, right?
I can just say,
oh, you call the int function,
and you know what that is.
That's, sorry, sorry, I'm just
awesomed out.
So you convert this to an integer, and
then you add 1 to it, and then we get 124.
So, there you go.
We've been doing strings all along, had to.
I mean, literally, strings and numeric data
are the two things that programs deal with.
So, we've been reading and converting.
Again, this is sort of a pattern from some
of the earlier programs
where we do a raw input, you know?
And the raw input just takes a string and
puts it in a variable.
So if I take Chuck, then the
variable contains the string C-h-u-c-k.
Even if we type numbers, that is a string.
We can't, just because I put 1 0 0 in,
I still can't subtract 10.
We get a happy little traceback, oh, happy
little, sad-faced traceback.
And, and, but of course, if we convert it
into float or something like that.
We convert int or float, we can do that
and subtract 10, and we can do that.
So, so we've been doing this for a while.
We've been doing strings and manipulating
strings and converting strings all along.
So the thing we're going to start doing
now is we're going to dive into strings.
We realize that strings are addressable at
a character-by-character basis,
and we can do all kind of cool
things with that.
And so, a string is a sequence of
characters, and we
can look inside them using what we call
the index operator,
the square brackets. And we've seen
square brackets in
lists, and you'll see that there's sort of
similarities between lists of numbers,
and, in effect, a
string is a special kind of list of
characters.
So if we take this string banana,
the string banana starts, the first
character starts at 0.
So, we call this operator sub, so
letter equals
fruit sub 1 and that is the second
character.
Now this may seem a little weird that the
first character
is a 0 and the second character is a 1.
It actually is kind of similar to the old
elevator thing, where in Europe they're
called, the first floor is zero, then
negative one,
and the second floor is one, right?
It's kind of the same thing.
Actually, it turns out that
internally zero was a better way
to start than one.
It, you'll get used to it and then after
a while there's
some little cool advantages to it, but for
now, beginning is zero.
Just, beginning is zero, it is the rule,
just remember it.
Okay, so the 0 is b, the 1 is a, the 2 is
n, et cetera, et cetera.
And we call this syntax
fruit sub 1, okay?
So that is the sub 1 character of fruit,
and then that is an a.
So that fruit sub 1 says, look up in
banana, find the 1 position,
and give me what's in that 1
position, that's what's the sub.
And what's inside these brackets can be
an expression.
So if we set n to 3, n minus 1, well
that'll compute to 2.
And then fruit sub 2 is the letter n,
right? So that's fruit sub 2, okay?
It's the third character, fruit sub 2.
So the index starts at 0, the, we read the
brackets as sub, fruit sub 1,
fruit sub 2. Now, Python will
complain to you if you use this sub
operator too far down the string.
Here is a character with 3, which
is 0, 1, and 2.
And if we go to sub 5, it blows up.
Now, you know, by now I hope that you're
not freaking out about traceback errors.
Remember, traceback errors are just Python
trying to inform you.
And if we just stop looking at that as
mean Python face, and
instead look at that as, oh, index error,
string index out of range.
Oh yeah, I stuck a five in there and
there's only three, oh,
my bad, thank you, Python, appreciate it,
thanks for the help.
So, think of this as like, it's not a
smiley face
but it's kind of like a, a quizzical face,
right, it's like [SOUND].
I don't know.
Python's confused and it's trying to tell
you what it's confused, okay?
So don't look at these as sad faces.
Python doesn't hate you, Python loves you.
And loves me too.
So, strings have individual
characters that we can address with the
index operator.
They also have length.
And there is a built-in function called
len, that we can call and pass in
as a parameter the variable or a
constant,
and it will tell us how many characters.
Now this banana has six characters in it
that are 0 through 5.
So don't get a little confused, the last
character is
the fifth, is sub 5, but it's also the
sixth character.
So the length is really the length, it's
not length minus 1, okay?
So len is like a built-in function.
It's not a function we have to write,
as we talked in chapter the functions
chapter.
There are things that are part of Python
that are just sitting there.
And so we are passing banana, the
variable
fruit, into function, we're passing it
into function.
And then, into the len function.
Then len [SOUND] does magic stuff.
And then out comes the answer.
And that 6 replaces this and then the 6 goes
into the variable x, and so x is 6.
I sure made that a messy looking slide.
And so, think of inside the len function,
there's a def.
len takes a parameter, does some loopy
things, and it does its thing.
So, it's a function that we might write
except we don't
have to because it's already written and
built in to Python.
Okay. So that's the length of the
string, that's getting it individual
characters.
We can also loop through strings.
Obviously, if we can use the index
operator, and we
can put a variable in there, we can
write a loop.
This is an indefinite loop.
So we have this variable fruit, has the
string banana in it.
We set the variable index to 0.
We make a little while loop.
And we ask,
as long as index is less than the length
of fruit.
Now remember, the length of fruit is
going to be 6.
But we don't want to make that less than
or equal to
because then we would crash, because
the last character is 5.
We can say letter is equal to fruit sub
index, so that's going to
start out being index of, is going to be
0, so that's fruit sub 0.
Then we print index and letter, so that
means the
first time through the loop we're
going to see 0 b.
Then we increment our
iteration operator, and go up.
And then we come out with 1 a.
And index advances until index is 6, but
has printed out each of the letters.
Now, we're not doing this just to
print them out, we will do something
a little more valuable,
valuable inside that loop.
But this gives the sense that we can work
through a loop just like we, we,
we can work through a string just like
we work through a list of numbers, okay?
Now, that was how you do it with an
indefinite loop.
In a definite loop, it's just far more
awesome, okay?
Just like we did with the list of numbers,
Python understands strings and allows us
to write
for loops, using for and in, that go through
the strings.
So basically, for letter in fruit, now
remember, I'm using letter as a
mnemonic variable here, it's just a
choice, a wise choice of a variable name.
So that says, run this little block of
text once for
each letter in the variable fruit, which
means that letter's going to
take on the successive b-a-n-a-n-a.
When I look at that I always worry that I
misspelled it.
I think I got these right.
If I rewrite this book, I'm not going to
use banana as the example because I'm
terrified that I misspelled banana,
because I don't
know how many n's banana has in it.
But, be that as it may, we are
abstracting, we are letting Python say,
run this little block of text once, in
order, for each of the letters in
the variable fruit, which is b-a-n-a, and
so it prints out each of the letters.
So this is a much prettier version of the,
the looping,
so using the definite, the for keyword
instead of the while keyword.
And so, we can just kind of compare these
two things.
They kind of do the exact same thing.
And it also is kind of a, gives you a
sense of what the for is doing for us,
right?
The for is
setting up this index, the for is
looking up
inside of fruit, and the for is advancing
the index.
So the for's doing a bunch of work for us
and I've characterized that, sort of, in
the previous lecture.
How the for is sort of doing a bunch of
things for us
and that's, it allows our code to
be more
expressive and, and instead of, so this
is, a lot of
this is just kind of bookkeeping crap that
we don't really need.
And so the for loop helps us by doing some
of the bookkeeping for us.
Okay, so we can do all those loops again.
We can find the largest letter, the
smallest letter, the, how many times.
So, I think, what, how many n's are in
this, or how many a's are in this.
So this is a simple counting pattern and,
and a looking pattern.
And so, our word is banana, our count is 0.
For the letter in word, again, boop, boop,
boop, boop, boop, that comes out like that.
So it's going to run this little block.
If the letter is a, add 1 to the count.
So this is going to basically print out at
the end the number of a's in banana.
It would probably be more useful, for me,
to print out the number
of n's in banana, because I never know how
many n's are in banana.
But it looks like there's supposed to be two,
or otherwise I have a typo on this slide.
So the in, again, I, I love the in.
I just absolutely
love this in.
I love this syntax.
This for each letter in the word banana.
Just, to me, it reads very smoothly.
Cognitively, it fits in my mind what's
going on.
For each letter in banana, run this little
indented block of text.
Again, very pretty, I love in, it's one of
my favorite little pieces of Python.
So, again, with the for, it takes care of
all the iteration variables for us, and it
goes through the sequence.
And so here's, here's an animation, right?
Remember that the for is going to do all
this work for us, right?
Letter is going to advance through the
successive values, the successive letters
in banana.
So letter is being moved for us by the for
statement, okay?
So that's looping through.
Now it turns out there's a lot of
common things that
we want to do that are already built into
Python for us.
Clear the screen there.
We call these slicing.
So the index operator looks up various
things in a string, but we
can also pull substrings out, using the
colon in addition to the square brackets.
Again, this is called slicing.
So the
colon operator, basically, takes a
starting position, and then an ending
position, but the ending position is up to
but not including the second one.
So this is, it's up to but not including,
up to but not including.
Just like the zero, you get used to it
pretty quick,
but the first time you see it, it's a
little bit
wonky.
So, if we're going 0 through 4, that's how
I read this print, s sub 0
through 4, or, or better, better said,
s 0, up to but not including 4.
That is, print me out the chunk that is up
to, but not including, 4.
So, it doesn't include 4, and so out comes
Mont, right?
So the next one is 6 up to but not
including 7, so it starts at 6,
up to but not including 7, so
out comes the P.
And, even though you might expect that it
would traceback on this, Python is a
little forgiving.
So here's a moment where Python is a
little
forgiving, saying, you know, I'll give you
a break here.
If you go 6, but up to, but not including 20,
I'll just stop at the end of the string.
So it's 6 to the end, so it, it, you can
over-reference here and
you can not, you won't get yourself in
trouble.
So that comes out, Python.
So, again, the second character is
up to but not including,
and that's the, kind of the
weird thing there.
Of course once you remember that
the first character
is 0, 0 up through but not including.
Nice.
If we leave off the first or the last
number, leaving off the first number, the
last number and both of them, they mean
the beginning and end of the string,
respectively.
And so, up to but not including 2 is M-o.
8 colon means starting at 8 to the end of
the string.
So that's, thon.
And then, that means
the beginning to the end, and so it's
just the whole string, Monty Python.
Now we've already played with string
concatenation, just a thing to
emphasize here is,
the concatenation operator does not
add a space, does not add a space.
If you want a space, you explicitly add it.
So here there's no space in between the o
and the t, but here
there is a space between the o and the t
because we explicitly added it.
So you can concatenate more than one
thing.
And you add your spaces as you want,
okay?
Another thing you can do is you can ask
questions about a string.
Sort of like doing a string lookup, using
the in operator.
This is a little different than how we use
it inside of a for loop.
This is a logical operation asking a
question
like less than or greater than or
whatever.
So, here's an expression.
So fruit is banana, as always.
Is n in fruit?
And the answer is yes it is, True.
So this
is a logical operation.
It's a question.
It's a true or false.
Is m in fruit?
No, False.
And you can, this can be a string, not
just a single character.
Is n-a-n in fruit?
The answer is True.
And you can put, sort of, you know, if,
parts of ifs, et cetera, et cetera.
So, this is a logical expression that can
be on an if,
you can have a while, et cetera, et
cetera, et cetera.
So it's a logical,
logical expression and it returns
True or False.
You can also do comparisons.
Now, the comparison operations, equals
makes a lot of sense, less
than and greater than depend on the
language that you're using Python.
And so, if you're using, like, a Latin
character set, then alphabetical matters.
You know, the, the way the Latin character
set would do.
But if you're in a different character
set, Python is
aware of multiple character sets and will
sort strings based on
the sorting algorithm of the particular
character set.
So you can do comparisons like equality,
less than, and greater than.
And we've seen some of these things in
previous lectures, actually.
We've had to use them.
So in addition, to, sort of, these sort of
fundamental operations that we
can do on strings, there's a extensive
library of built-in capabilities
in Python.
And so the, the way we see these built-in
capabilities
are they're, they're actually sort of
built in to strings.
So, let's go real slow here.
Here we have a variable called greet and
we're sticking the string Hello Bob
into it.
Now greet is of type string, as a result
of this, and it contains Hello Bob as its
value.
But we can actually access
capabilities inside of this value. So we
can say, greet.lower().
This is calling something that's part of
greet itself, it's a part of all strings.
The fact that greet contains a string,
means that we can ask for,
hey, give me greet, which just gives you
back what you're looking for.
Like here, print greet is Hello Bob.
Or you can say give me greet lower, so
this is giving me a lowercase copy.
It doesn't convert it to lowercase.
It gives me a lowercase copy of Hello Bob.
So zap is hello bob, all lowercase.
Now, it didn't change greet, right?
And, you can even put this .lower on the
end of constants so, why you'd do this, I don't
know, but Hi There, with H and T capitalized,
.lower comes out as hi there.
So this bit is part of
all strings.
Both variables and constants have these
string functions built into them.
And every instance of a string, whether it
be a variable or a constant, has these
capabilities.
They don't modify it, they just give you
back a copy.
Now it turns out there is a, a
command inside Python called dir, to ask
questions like
hey, well here's, you know, stuff
has got Hello World.
We can say. Redo this.
Come here.
Stuff is a string.
We can ask, hey, what are you?
I am a string.
dir is another built-in Python that asks
the question, hey, what are all
the things that are built into this that I
can make use of?
And here they are.
That's kind of a raw dump of them.
You can also go look at
the online documentation for Python and
see at the Pyth, oop, at
the Python website, you can see a whole
bunch of these things.
And they have the calling sequence, what
the parameters are, et cetera.
So when you're looking these things up,
you can go, go read about them.
Here's just a few that I've pulled out,
capitalize, which uppercases the
first characters,
center, endswith, find, there's stripping.
So I'll look through a couple of these,
just the kind of things to be looking for.
It'll be a good idea to take a look and read
through some of the things.
Here's a couple that, that we'll probably
be using early on.
The find function, it's similar to in but
it tells you where it finds the, the
particular thing that it's looking for.
And and so we'll put fruit is banana.
And I'm going to say pos, which is
going to be an integer variable,
equals fruit.find("na").
So what it's saying is, go look inside
this variable fruit,
hunt until you find the first occurrence
of the string na.
Hunt, hunt, hunt, hunt, whoop, got it.
And then return it to me.
So that's going to give me back 2.
2 is where it found it, right?
So, where is it in the string, that's what
find does.
And if you don't find anything, like
you're looking for z,
no, no, no, I didn't find a z, then it
gives me back negative 1.
So just, again, this is just one of many
built-in functions in string.
The ability to find a substring, okay?
Or find, yeah, find a character or string
within another string.
There's a lower case, there's also an
upper case, This might be better named
shouting.
Upper means give me an uppercase copy of
this variable.
So Hello Bob becomes HELLO BOB, and then
lower is hello bob, right?
So these are both ways to get copies of
uppercase and lowercase versions.
You might think these are kind of silly,
but one of the things
that you tend to use lower for is if
you're doing searching and
you want to ignore case, you convert the
whole thing
to lowercase, and then you search for a
lowercase string.
So you, depends on if you want to ignore
case or not.
So that's, that's one of the reasons that
you have things like this.
There is a replace function.
Again, it doesn't change the value.
Greet is going to have Hello Bob.
And I'm going to say, greet.replace all
occurrences of Bob with Jane.
That gives me back a copy, in nstr, says
Hello Jane.
So, so greet is unchanged.
This replace says, make a copy and then
make that following
edit that you, that, that we've requested.
[COUGH] Now we can also say, well, I
mean, the replace
is going to do all occurrences, so greet
is still Hello Bob.
This is kind of redundant here.
I'm just doing it so you remember what it is.
Greet is still Hello Bob.
I put Hello Bob back in it and replace
all the occurrences of lowercase o with
uppercase X.
And then that happens.
So this says,
go through the whole string [SOUND] doing
all those replaces, okay?
Now another common thing that we're
going to have to do
is we're going to have to throw away
whitespace.
Sometimes you have a string that
has characters, blank characters, or other
characters,
at the beginning and the end, nonprintable
characters, and we can strip them.
And there's three charact, three functions
that are built into
to Python strings that do this for us.
There is lstrip, which strips from the left.
There is rstrip, which strips from the right.
So it throws away these whitespaces, so,
Hello Bob is gone.
I mean, the, so it gets rid of these
characters.
Oops, these are the ones that are gotten
rid of there.
I need an eraser.
And then
let's use white, and then strip
basically, gets rid of
all the whitespace, both on the left and
the right side.
And gets rid of that.
So we're going to, we're going to be using
these a lot.
It, one of the things you tend to do in
Python is cleaning up data.
Sometimes if you have spaces at the
beginning or
the end, you just want to kind of ignore
them.
So you strip them off, you throw them
away.
When we're looking for data, we sometimes
are looking for a prefix, and
there is a startswith function [COUGH]
that gives you a true or a false.
We're asking here, does this variable line
start with the string Please.
And the answer is True, because it does
start with the string Please.
Or, and then next, we ask, does this start
with the letter p?
And the answer is False, it does not start
with the letter p.
Okay? So there's
lots more of these things.
And reading data and tearing it apart is
one of the things that we're going to
really focus on for the rest of these
first few chapters of the book, okay?
Because that's one thing that Python's
really good at is
tearing data into pieces and pulling the
pieces that you want.
So, so let's take a look at this line.
So this line that we've got here is a line
from an actual email box.
This is what, if you
looked at your email, sort of, on your hard
drive, email boxes would have this kind of
a format.
And there's actually many lines, and soon
we'll reading whole files full of email.
But for now, let's just say we've got this
one line, somehow.
And we're looking for, we don't know
how long
these things are going to be, the first
charac, the
first thing is from, then there's an
email address,
then there's some detail about when the
mail was sent.
But what we actually want is
we want this part right here,
and that's the domain name of the mail
address, right?
We want to extract this out.
We're faced with this line, in a variable,
and we want to extract that out.
So this is kind of putting all these
things together.
So let's walk through how we do this.
So, here's this line, and it's a big long
string.
Mostly we would've read this from a file,
rather than just put it in a constant, but
for now we
put it in a constant, because we, files is
the next chapter.
And so what we're going to do is we're
going to say, you
know what, I'm going to look at this line
and I'm going to go
find the @ sign, and I want to know where
the @ sign is.
So I call data.find @ sign, and put
the result in atpos.
And that gives me 21.
It hunts until it finds the @ sign, and
then tells me where I found it.
Then what I want to look at is, starting
here, for the rest of the string, I want
to find the first space afterwards.
So what I say is, this, sppos is my
variable for the position of the space,
data.find, a blank, starting
at the @.
So this is starting at 21.
So it says, I'll start
at 21 and I'll look for the next blank.
And I find that at 31.
So now I know where the @ sign is and I
know where the space is.
And so what I'm looking at is, I want the
stuff
one beyond the @ sign, up to but not
including the space.
So then I can use a slicing operation, I
can use a slicing operation.
Start at the @ position, add 1 to it,
so advance 1, that's going to be the
letter u.
And then a slicing operation, up to but
not including space.
Up to, this is going to work out nicely
all of a sudden, but not
including, okay?
And then
I'm going to take that slice, which is
really this little bit of data right here,
take that slice, and put in the variable
host.
Then we print that out and we get the
piece, okay?
And so, here we have some data we want to
tear apart.
We hunt for the @.
We find it at position 21.
We start at 21 and we look for the, the
space after that.
31, and then we pull from 22, up to but
not including, 31.
And it, it wouldn't matter where this
thing was, because these aren't all
the same length when we start looking at
them in files, but it
would have found the @ sign and the space
after the @ sign,
and it would have reliably
pulled out the host, okay?
So this is a basic pattern we call
parsing.
Parsing text.
Find this, find that other thing, grab
this thing out,
then look inside that thing and [SOUND].
So it does all these things, right?
So, that's kind of like strings.
Up next, we have files.
Files are going to be lots of strings.
So we're going to start putting all these
things together.
And and so the next chapter is a really,
really
important chapter, where it starts to
really start coming together.
So see you soon.