Hello, and welcome to Chapter Six. This chapter we're going to talk about strings, and stuff is going to start to get real now. So, as always, this material, this video, these slides and book are copyright Creative Commons Attribution. I want you to use these materials. I want you to, somebody else, I want to make more teachers, so everyone can teach this stuff. Use it however you like. Okay, so we've been playing with strings from the beginning. I mean, literally, if we didn't work with strings, we could've never printed Hello World. And, and lord knows, we need to print Hello World in a programming language. And so, we've been using them, especially constants. Now, in this chapter, we're going to dig in. So, oops, so a string is a sequence of characters. You can use either use single quotes or double quotes in Python to delimit a string. And so here's two string constants, Hello and there, and stuck into the variables str1 and str2. We can concatenate them together with a plus sign. Python is smart enough to look and say, oh, those are strings, I know what to do with those. And you'll notice that the plus doesn't add any space here, because when we print bob out, Hello and there are right next to one another. If, for example, we've done some conversions, so when we were, like, reading pay, and rate, and hours, and stuff, we've done some conversions. So this is an example of the, a string 1 2 3 Not 123, but the string, quote 1 2 3 quote. And we can't add 1 to this, we get a traceback, kind of, at this point, as we expected. And we would convert that to an integer using the int function that's built in. See how much Python you already know? I mean, this is awesome, right? I can just say, oh, you call the int function, and you know what that is. That's, sorry, sorry, I'm just awesomed out. So you convert this to an integer, and then you add 1 to it, and then we get 124. So, there you go. We've been doing strings all along, had to. I mean, literally, strings and numeric data are the two things that programs deal with. So, we've been reading and converting. Again, this is sort of a pattern from some of the earlier programs where we do a raw input, you know? And the raw input just takes a string and puts it in a variable. So if I take Chuck, then the variable contains the string C-h-u-c-k. Even if we type numbers, that is a string. We can't, just because I put 1 0 0 in, I still can't subtract 10. We get a happy little traceback, oh, happy little, sad-faced traceback. And, and, but of course, if we convert it into float or something like that. We convert int or float, we can do that and subtract 10, and we can do that. So, so we've been doing this for a while. We've been doing strings and manipulating strings and converting strings all along. So the thing we're going to start doing now is we're going to dive into strings. We realize that strings are addressable at a character-by-character basis, and we can do all kind of cool things with that. And so, a string is a sequence of characters, and we can look inside them using what we call the index operator, the square brackets. And we've seen square brackets in lists, and you'll see that there's sort of similarities between lists of numbers, and, in effect, a string is a special kind of list of characters. So if we take this string banana, the string banana starts, the first character starts at 0. So, we call this operator sub, so letter equals fruit sub 1 and that is the second character. Now this may seem a little weird that the first character is a 0 and the second character is a 1. It actually is kind of similar to the old elevator thing, where in Europe they're called, the first floor is zero, then negative one, and the second floor is one, right? It's kind of the same thing. Actually, it turns out that internally zero was a better way to start than one. It, you'll get used to it and then after a while there's some little cool advantages to it, but for now, beginning is zero. Just, beginning is zero, it is the rule, just remember it. Okay, so the 0 is b, the 1 is a, the 2 is n, et cetera, et cetera. And we call this syntax fruit sub 1, okay? So that is the sub 1 character of fruit, and then that is an a. So that fruit sub 1 says, look up in banana, find the 1 position, and give me what's in that 1 position, that's what's the sub. And what's inside these brackets can be an expression. So if we set n to 3, n minus 1, well that'll compute to 2. And then fruit sub 2 is the letter n, right? So that's fruit sub 2, okay? It's the third character, fruit sub 2. So the index starts at 0, the, we read the brackets as sub, fruit sub 1, fruit sub 2. Now, Python will complain to you if you use this sub operator too far down the string. Here is a character with 3, which is 0, 1, and 2. And if we go to sub 5, it blows up. Now, you know, by now I hope that you're not freaking out about traceback errors. Remember, traceback errors are just Python trying to inform you. And if we just stop looking at that as mean Python face, and instead look at that as, oh, index error, string index out of range. Oh yeah, I stuck a five in there and there's only three, oh, my bad, thank you, Python, appreciate it, thanks for the help. So, think of this as like, it's not a smiley face but it's kind of like a, a quizzical face, right, it's like [SOUND]. I don't know. Python's confused and it's trying to tell you what it's confused, okay? So don't look at these as sad faces. Python doesn't hate you, Python loves you. And loves me too. So, strings have individual characters that we can address with the index operator. They also have length. And there is a built-in function called len, that we can call and pass in as a parameter the variable or a constant, and it will tell us how many characters. Now this banana has six characters in it that are 0 through 5. So don't get a little confused, the last character is the fifth, is sub 5, but it's also the sixth character. So the length is really the length, it's not length minus 1, okay? So len is like a built-in function. It's not a function we have to write, as we talked in chapter the functions chapter. There are things that are part of Python that are just sitting there. And so we are passing banana, the variable fruit, into function, we're passing it into function. And then, into the len function. Then len [SOUND] does magic stuff. And then out comes the answer. And that 6 replaces this and then the 6 goes into the variable x, and so x is 6. I sure made that a messy looking slide. And so, think of inside the len function, there's a def. len takes a parameter, does some loopy things, and it does its thing. So, it's a function that we might write except we don't have to because it's already written and built in to Python. Okay. So that's the length of the string, that's getting it individual characters. We can also loop through strings. Obviously, if we can use the index operator, and we can put a variable in there, we can write a loop. This is an indefinite loop. So we have this variable fruit, has the string banana in it. We set the variable index to 0. We make a little while loop. And we ask, as long as index is less than the length of fruit. Now remember, the length of fruit is going to be 6. But we don't want to make that less than or equal to because then we would crash, because the last character is 5. We can say letter is equal to fruit sub index, so that's going to start out being index of, is going to be 0, so that's fruit sub 0. Then we print index and letter, so that means the first time through the loop we're going to see 0 b. Then we increment our iteration operator, and go up. And then we come out with 1 a. And index advances until index is 6, but has printed out each of the letters. Now, we're not doing this just to print them out, we will do something a little more valuable, valuable inside that loop. But this gives the sense that we can work through a loop just like we, we, we can work through a string just like we work through a list of numbers, okay? Now, that was how you do it with an indefinite loop. In a definite loop, it's just far more awesome, okay? Just like we did with the list of numbers, Python understands strings and allows us to write for loops, using for and in, that go through the strings. So basically, for letter in fruit, now remember, I'm using letter as a mnemonic variable here, it's just a choice, a wise choice of a variable name. So that says, run this little block of text once for each letter in the variable fruit, which means that letter's going to take on the successive b-a-n-a-n-a. When I look at that I always worry that I misspelled it. I think I got these right. If I rewrite this book, I'm not going to use banana as the example because I'm terrified that I misspelled banana, because I don't know how many n's banana has in it. But, be that as it may, we are abstracting, we are letting Python say, run this little block of text once, in order, for each of the letters in the variable fruit, which is b-a-n-a, and so it prints out each of the letters. So this is a much prettier version of the, the looping, so using the definite, the for keyword instead of the while keyword. And so, we can just kind of compare these two things. They kind of do the exact same thing. And it also is kind of a, gives you a sense of what the for is doing for us, right? The for is setting up this index, the for is looking up inside of fruit, and the for is advancing the index. So the for's doing a bunch of work for us and I've characterized that, sort of, in the previous lecture. How the for is sort of doing a bunch of things for us and that's, it allows our code to be more expressive and, and instead of, so this is, a lot of this is just kind of bookkeeping crap that we don't really need. And so the for loop helps us by doing some of the bookkeeping for us. Okay, so we can do all those loops again. We can find the largest letter, the smallest letter, the, how many times. So, I think, what, how many n's are in this, or how many a's are in this. So this is a simple counting pattern and, and a looking pattern. And so, our word is banana, our count is 0. For the letter in word, again, boop, boop, boop, boop, boop, that comes out like that. So it's going to run this little block. If the letter is a, add 1 to the count. So this is going to basically print out at the end the number of a's in banana. It would probably be more useful, for me, to print out the number of n's in banana, because I never know how many n's are in banana. But it looks like there's supposed to be two, or otherwise I have a typo on this slide. So the in, again, I, I love the in. I just absolutely love this in. I love this syntax. This for each letter in the word banana. Just, to me, it reads very smoothly. Cognitively, it fits in my mind what's going on. For each letter in banana, run this little indented block of text. Again, very pretty, I love in, it's one of my favorite little pieces of Python. So, again, with the for, it takes care of all the iteration variables for us, and it goes through the sequence. And so here's, here's an animation, right? Remember that the for is going to do all this work for us, right? Letter is going to advance through the successive values, the successive letters in banana. So letter is being moved for us by the for statement, okay? So that's looping through. Now it turns out there's a lot of common things that we want to do that are already built into Python for us. Clear the screen there. We call these slicing. So the index operator looks up various things in a string, but we can also pull substrings out, using the colon in addition to the square brackets. Again, this is called slicing. So the colon operator, basically, takes a starting position, and then an ending position, but the ending position is up to but not including the second one. So this is, it's up to but not including, up to but not including. Just like the zero, you get used to it pretty quick, but the first time you see it, it's a little bit wonky. So, if we're going 0 through 4, that's how I read this print, s sub 0 through 4, or, or better, better said, s 0, up to but not including 4. That is, print me out the chunk that is up to, but not including, 4. So, it doesn't include 4, and so out comes Mont, right? So the next one is 6 up to but not including 7, so it starts at 6, up to but not including 7, so out comes the P. And, even though you might expect that it would traceback on this, Python is a little forgiving. So here's a moment where Python is a little forgiving, saying, you know, I'll give you a break here. If you go 6, but up to, but not including 20, I'll just stop at the end of the string. So it's 6 to the end, so it, it, you can over-reference here and you can not, you won't get yourself in trouble. So that comes out, Python. So, again, the second character is up to but not including, and that's the, kind of the weird thing there. Of course once you remember that the first character is 0, 0 up through but not including. Nice. If we leave off the first or the last number, leaving off the first number, the last number and both of them, they mean the beginning and end of the string, respectively. And so, up to but not including 2 is M-o. 8 colon means starting at 8 to the end of the string. So that's, thon. And then, that means the beginning to the end, and so it's just the whole string, Monty Python. Now we've already played with string concatenation, just a thing to emphasize here is, the concatenation operator does not add a space, does not add a space. If you want a space, you explicitly add it. So here there's no space in between the o and the t, but here there is a space between the o and the t because we explicitly added it. So you can concatenate more than one thing. And you add your spaces as you want, okay? Another thing you can do is you can ask questions about a string. Sort of like doing a string lookup, using the in operator. This is a little different than how we use it inside of a for loop. This is a logical operation asking a question like less than or greater than or whatever. So, here's an expression. So fruit is banana, as always. Is n in fruit? And the answer is yes it is, True. So this is a logical operation. It's a question. It's a true or false. Is m in fruit? No, False. And you can, this can be a string, not just a single character. Is n-a-n in fruit? The answer is True. And you can put, sort of, you know, if, parts of ifs, et cetera, et cetera. So, this is a logical expression that can be on an if, you can have a while, et cetera, et cetera, et cetera. So it's a logical, logical expression and it returns True or False. You can also do comparisons. Now, the comparison operations, equals makes a lot of sense, less than and greater than depend on the language that you're using Python. And so, if you're using, like, a Latin character set, then alphabetical matters. You know, the, the way the Latin character set would do. But if you're in a different character set, Python is aware of multiple character sets and will sort strings based on the sorting algorithm of the particular character set. So you can do comparisons like equality, less than, and greater than. And we've seen some of these things in previous lectures, actually. We've had to use them. So in addition, to, sort of, these sort of fundamental operations that we can do on strings, there's a extensive library of built-in capabilities in Python. And so the, the way we see these built-in capabilities are they're, they're actually sort of built in to strings. So, let's go real slow here. Here we have a variable called greet and we're sticking the string Hello Bob into it. Now greet is of type string, as a result of this, and it contains Hello Bob as its value. But we can actually access capabilities inside of this value. So we can say, greet.lower(). This is calling something that's part of greet itself, it's a part of all strings. The fact that greet contains a string, means that we can ask for, hey, give me greet, which just gives you back what you're looking for. Like here, print greet is Hello Bob. Or you can say give me greet lower, so this is giving me a lowercase copy. It doesn't convert it to lowercase. It gives me a lowercase copy of Hello Bob. So zap is hello bob, all lowercase. Now, it didn't change greet, right? And, you can even put this .lower on the end of constants so, why you'd do this, I don't know, but Hi There, with H and T capitalized, .lower comes out as hi there. So this bit is part of all strings. Both variables and constants have these string functions built into them. And every instance of a string, whether it be a variable or a constant, has these capabilities. They don't modify it, they just give you back a copy. Now it turns out there is a, a command inside Python called dir, to ask questions like hey, well here's, you know, stuff has got Hello World. We can say. Redo this. Come here. Stuff is a string. We can ask, hey, what are you? I am a string. dir is another built-in Python that asks the question, hey, what are all the things that are built into this that I can make use of? And here they are. That's kind of a raw dump of them. You can also go look at the online documentation for Python and see at the Pyth, oop, at the Python website, you can see a whole bunch of these things. And they have the calling sequence, what the parameters are, et cetera. So when you're looking these things up, you can go, go read about them. Here's just a few that I've pulled out, capitalize, which uppercases the first characters, center, endswith, find, there's stripping. So I'll look through a couple of these, just the kind of things to be looking for. It'll be a good idea to take a look and read through some of the things. Here's a couple that, that we'll probably be using early on. The find function, it's similar to in but it tells you where it finds the, the particular thing that it's looking for. And and so we'll put fruit is banana. And I'm going to say pos, which is going to be an integer variable, equals fruit.find("na"). So what it's saying is, go look inside this variable fruit, hunt until you find the first occurrence of the string na. Hunt, hunt, hunt, hunt, whoop, got it. And then return it to me. So that's going to give me back 2. 2 is where it found it, right? So, where is it in the string, that's what find does. And if you don't find anything, like you're looking for z, no, no, no, I didn't find a z, then it gives me back negative 1. So just, again, this is just one of many built-in functions in string. The ability to find a substring, okay? Or find, yeah, find a character or string within another string. There's a lower case, there's also an upper case, This might be better named shouting. Upper means give me an uppercase copy of this variable. So Hello Bob becomes HELLO BOB, and then lower is hello bob, right? So these are both ways to get copies of uppercase and lowercase versions. You might think these are kind of silly, but one of the things that you tend to use lower for is if you're doing searching and you want to ignore case, you convert the whole thing to lowercase, and then you search for a lowercase string. So you, depends on if you want to ignore case or not. So that's, that's one of the reasons that you have things like this. There is a replace function. Again, it doesn't change the value. Greet is going to have Hello Bob. And I'm going to say, greet.replace all occurrences of Bob with Jane. That gives me back a copy, in nstr, says Hello Jane. So, so greet is unchanged. This replace says, make a copy and then make that following edit that you, that, that we've requested. [COUGH] Now we can also say, well, I mean, the replace is going to do all occurrences, so greet is still Hello Bob. This is kind of redundant here. I'm just doing it so you remember what it is. Greet is still Hello Bob. I put Hello Bob back in it and replace all the occurrences of lowercase o with uppercase X. And then that happens. So this says, go through the whole string [SOUND] doing all those replaces, okay? Now another common thing that we're going to have to do is we're going to have to throw away whitespace. Sometimes you have a string that has characters, blank characters, or other characters, at the beginning and the end, nonprintable characters, and we can strip them. And there's three charact, three functions that are built into to Python strings that do this for us. There is lstrip, which strips from the left. There is rstrip, which strips from the right. So it throws away these whitespaces, so, Hello Bob is gone. I mean, the, so it gets rid of these characters. Oops, these are the ones that are gotten rid of there. I need an eraser. And then let's use white, and then strip basically, gets rid of all the whitespace, both on the left and the right side. And gets rid of that. So we're going to, we're going to be using these a lot. It, one of the things you tend to do in Python is cleaning up data. Sometimes if you have spaces at the beginning or the end, you just want to kind of ignore them. So you strip them off, you throw them away. When we're looking for data, we sometimes are looking for a prefix, and there is a startswith function [COUGH] that gives you a true or a false. We're asking here, does this variable line start with the string Please. And the answer is True, because it does start with the string Please. Or, and then next, we ask, does this start with the letter p? And the answer is False, it does not start with the letter p. Okay? So there's lots more of these things. And reading data and tearing it apart is one of the things that we're going to really focus on for the rest of these first few chapters of the book, okay? Because that's one thing that Python's really good at is tearing data into pieces and pulling the pieces that you want. So, so let's take a look at this line. So this line that we've got here is a line from an actual email box. This is what, if you looked at your email, sort of, on your hard drive, email boxes would have this kind of a format. And there's actually many lines, and soon we'll reading whole files full of email. But for now, let's just say we've got this one line, somehow. And we're looking for, we don't know how long these things are going to be, the first charac, the first thing is from, then there's an email address, then there's some detail about when the mail was sent. But what we actually want is we want this part right here, and that's the domain name of the mail address, right? We want to extract this out. We're faced with this line, in a variable, and we want to extract that out. So this is kind of putting all these things together. So let's walk through how we do this. So, here's this line, and it's a big long string. Mostly we would've read this from a file, rather than just put it in a constant, but for now we put it in a constant, because we, files is the next chapter. And so what we're going to do is we're going to say, you know what, I'm going to look at this line and I'm going to go find the @ sign, and I want to know where the @ sign is. So I call data.find @ sign, and put the result in atpos. And that gives me 21. It hunts until it finds the @ sign, and then tells me where I found it. Then what I want to look at is, starting here, for the rest of the string, I want to find the first space afterwards. So what I say is, this, sppos is my variable for the position of the space, data.find, a blank, starting at the @. So this is starting at 21. So it says, I'll start at 21 and I'll look for the next blank. And I find that at 31. So now I know where the @ sign is and I know where the space is. And so what I'm looking at is, I want the stuff one beyond the @ sign, up to but not including the space. So then I can use a slicing operation, I can use a slicing operation. Start at the @ position, add 1 to it, so advance 1, that's going to be the letter u. And then a slicing operation, up to but not including space. Up to, this is going to work out nicely all of a sudden, but not including, okay? And then I'm going to take that slice, which is really this little bit of data right here, take that slice, and put in the variable host. Then we print that out and we get the piece, okay? And so, here we have some data we want to tear apart. We hunt for the @. We find it at position 21. We start at 21 and we look for the, the space after that. 31, and then we pull from 22, up to but not including, 31. And it, it wouldn't matter where this thing was, because these aren't all the same length when we start looking at them in files, but it would have found the @ sign and the space after the @ sign, and it would have reliably pulled out the host, okay? So this is a basic pattern we call parsing. Parsing text. Find this, find that other thing, grab this thing out, then look inside that thing and [SOUND]. So it does all these things, right? So, that's kind of like strings. Up next, we have files. Files are going to be lots of strings. So we're going to start putting all these things together. And and so the next chapter is a really, really important chapter, where it starts to really start coming together. So see you soon.