WEBVTT 00:00:00.200 --> 00:00:02.820 Hello, and welcome to Chapter Six. 00:00:02.820 --> 00:00:05.240 This chapter we're going to talk about strings, and 00:00:05.240 --> 00:00:08.610 stuff is going to start to get real now. 00:00:08.610 --> 00:00:12.610 So, as always, this material, this video, these 00:00:12.610 --> 00:00:15.550 slides and book are copyright Creative Commons Attribution. 00:00:15.550 --> 00:00:16.870 I want you to use these materials. 00:00:16.870 --> 00:00:18.800 I want you to, somebody else, I want to 00:00:18.800 --> 00:00:21.720 make more teachers, so everyone can teach this stuff. 00:00:21.720 --> 00:00:22.790 Use it however you like. 00:00:24.010 --> 00:00:25.280 Okay, so we've been playing with 00:00:25.280 --> 00:00:26.730 strings from the beginning. 00:00:26.730 --> 00:00:28.320 I mean, literally, if we didn't work 00:00:28.320 --> 00:00:31.040 with strings, we could've never printed Hello World. 00:00:31.040 --> 00:00:35.813 And, and lord knows, we need to print Hello World in a programming language. 00:00:35.813 --> 00:00:39.610 And so, we've been using them, especially constants. 00:00:39.610 --> 00:00:41.650 Now, in this chapter, we're going to dig in. 00:00:41.650 --> 00:00:46.986 So, oops, so a string is a sequence of characters. 00:00:46.986 --> 00:00:50.408 You can use either use single quotes or double quotes in Python 00:00:50.408 --> 00:00:51.360 to delimit a string. 00:00:51.360 --> 00:00:54.528 And so here's two string constants, Hello and there, 00:00:54.528 --> 00:00:58.140 and stuck into the variables str1 and str2. 00:00:58.140 --> 00:01:00.520 We can concatenate them together with a plus sign. 00:01:00.520 --> 00:01:03.100 Python is smart enough to look and say, 00:01:03.100 --> 00:01:05.694 oh, those are strings, I know what to do with those. 00:01:05.694 --> 00:01:09.588 And you'll notice that the plus doesn't add any space here, because when 00:01:09.588 --> 00:01:13.566 we print bob out, Hello and there are right next to one another. 00:01:13.566 --> 00:01:17.014 If, for example, we've done some conversions, 00:01:17.014 --> 00:01:18.799 so when we were, like, reading pay, 00:01:18.799 --> 00:01:20.640 and rate, and hours, and stuff, we've done some conversions. 00:01:20.640 --> 00:01:23.378 So this is an example of the, a string 1 2 3 00:01:23.378 --> 00:01:27.211 Not 123, but the string, quote 1 2 3 quote. 00:01:27.211 --> 00:01:29.270 And we can't add 1 to this, we get 00:01:29.270 --> 00:01:32.910 a traceback, kind of, at this point, as we expected. 00:01:32.910 --> 00:01:37.020 And we would convert that to an integer using the int function that's built in. 00:01:37.020 --> 00:01:39.900 See how much Python you already know? I mean, this is awesome, right? 00:01:39.900 --> 00:01:40.970 I can just say, 00:01:40.970 --> 00:01:42.800 oh, you call the int function, and you know what that is. 00:01:42.800 --> 00:01:46.220 That's, sorry, sorry, I'm just awesomed out. 00:01:46.220 --> 00:01:50.800 So you convert this to an integer, and then you add 1 to it, and then we get 124. 00:01:50.800 --> 00:01:52.270 So, there you go. 00:01:52.270 --> 00:01:54.740 We've been doing strings all along, had to. 00:01:54.740 --> 00:01:57.000 I mean, literally, strings and numeric data 00:01:57.000 --> 00:01:59.930 are the two things that programs deal with. 00:01:59.930 --> 00:02:03.120 So, we've been reading and converting. 00:02:03.120 --> 00:02:05.175 Again, this is sort of a pattern from some of the earlier programs 00:02:05.175 --> 00:02:08.661 where we do a raw input, you know? 00:02:08.661 --> 00:02:10.887 And the raw input just takes a string and puts it in a variable. 00:02:10.887 --> 00:02:14.560 So if I take Chuck, then the variable contains the string C-h-u-c-k. 00:02:15.990 --> 00:02:18.970 Even if we type numbers, that is a string. 00:02:18.970 --> 00:02:23.660 We can't, just because I put 1 0 0 in, I still can't subtract 10. 00:02:23.660 --> 00:02:28.270 We get a happy little traceback, oh, happy little, sad-faced traceback. 00:02:28.270 --> 00:02:31.294 And, and, but of course, if we convert it 00:02:31.294 --> 00:02:34.050 into float or something like that. 00:02:35.190 --> 00:02:38.670 We convert int or float, we can do that and subtract 10, and we can do that. 00:02:38.670 --> 00:02:41.680 So, so we've been doing this for a while. 00:02:41.680 --> 00:02:45.130 We've been doing strings and manipulating strings and converting strings all along. 00:02:45.130 --> 00:02:49.051 So the thing we're going to start doing now is we're going to dive into strings. 00:02:49.051 --> 00:02:53.098 We realize that strings are addressable at a character-by-character basis, 00:02:53.098 --> 00:02:56.350 and we can do all kind of cool things with that. 00:02:56.350 --> 00:02:59.998 And so, a string is a sequence of characters, and we 00:02:59.998 --> 00:03:04.450 can look inside them using what we call the index operator, 00:03:04.450 --> 00:03:06.720 the square brackets. And we've seen square brackets in 00:03:06.720 --> 00:03:08.230 lists, and you'll see that there's sort of 00:03:08.230 --> 00:03:11.610 similarities between lists of numbers, and, in effect, a 00:03:11.610 --> 00:03:14.350 string is a special kind of list of characters. 00:03:14.350 --> 00:03:17.197 So if we take this string banana, 00:03:17.197 --> 00:03:21.242 the string banana starts, the first character starts at 0. 00:03:21.242 --> 00:03:24.891 So, we call this operator sub, so letter equals 00:03:24.891 --> 00:03:28.383 fruit sub 1 and that is the second character. 00:03:28.383 --> 00:03:30.603 Now this may seem a little weird that the first character 00:03:30.603 --> 00:03:33.956 is a 0 and the second character is a 1. 00:03:33.956 --> 00:03:38.500 It actually is kind of similar to the old elevator thing, where in Europe they're 00:03:38.500 --> 00:03:41.124 called, the first floor is zero, then negative one, 00:03:41.124 --> 00:03:43.558 and the second floor is one, right? 00:03:43.558 --> 00:03:46.093 It's kind of the same thing. Actually, it turns out that 00:03:46.093 --> 00:03:50.456 internally zero was a better way to start than one. 00:03:50.456 --> 00:03:54.156 It, you'll get used to it and then after a while there's 00:03:54.156 --> 00:03:58.540 some little cool advantages to it, but for now, beginning is zero. 00:03:58.540 --> 00:04:01.939 Just, beginning is zero, it is the rule, just remember it. 00:04:02.970 --> 00:04:08.790 Okay, so the 0 is b, the 1 is a, the 2 is n, et cetera, et cetera. 00:04:08.790 --> 00:04:11.160 And we call this syntax 00:04:11.160 --> 00:04:12.540 fruit sub 1, okay? 00:04:12.540 --> 00:04:17.123 So that is the sub 1 character of fruit, and then that is an a. 00:04:17.123 --> 00:04:21.250 So that fruit sub 1 says, look up in banana, find the 1 position, 00:04:21.250 --> 00:04:25.870 and give me what's in that 1 position, that's what's the sub. 00:04:25.870 --> 00:04:29.570 And what's inside these brackets can be an expression. 00:04:29.570 --> 00:04:33.690 So if we set n to 3, n minus 1, well that'll compute to 2. 00:04:33.690 --> 00:04:36.660 And then fruit sub 2 is the letter n, 00:04:36.660 --> 00:04:39.979 right? So that's fruit sub 2, okay? 00:04:39.979 --> 00:04:42.320 It's the third character, fruit sub 2. 00:04:42.320 --> 00:04:47.336 So the index starts at 0, the, we read the brackets as sub, fruit sub 1, 00:04:47.336 --> 00:04:52.750 fruit sub 2. Now, Python will 00:04:52.750 --> 00:04:57.860 complain to you if you use this sub operator too far down the string. 00:04:57.860 --> 00:05:01.316 Here is a character with 3, which is 0, 1, and 2. 00:05:01.316 --> 00:05:05.420 And if we go to sub 5, it blows up. 00:05:05.420 --> 00:05:10.260 Now, you know, by now I hope that you're not freaking out about traceback errors. 00:05:10.260 --> 00:05:14.070 Remember, traceback errors are just Python trying to inform you. 00:05:14.070 --> 00:05:18.930 And if we just stop looking at that as mean Python face, and 00:05:18.930 --> 00:05:24.190 instead look at that as, oh, index error, string index out of range. 00:05:24.190 --> 00:05:27.360 Oh yeah, I stuck a five in there and there's only three, oh, 00:05:27.360 --> 00:05:31.330 my bad, thank you, Python, appreciate it, thanks for the help. 00:05:31.330 --> 00:05:34.870 So, think of this as like, it's not a smiley face 00:05:34.870 --> 00:05:38.690 but it's kind of like a, a quizzical face, right, it's like [SOUND]. 00:05:38.690 --> 00:05:39.660 I don't know. 00:05:39.660 --> 00:05:42.950 Python's confused and it's trying to tell you what it's confused, okay? 00:05:42.950 --> 00:05:46.780 So don't look at these as sad faces. Python doesn't hate you, Python loves you. 00:05:48.170 --> 00:05:52.420 And loves me too. So, strings have individual 00:05:52.420 --> 00:05:54.420 characters that we can address with the index operator. 00:05:54.420 --> 00:05:56.160 They also have length. 00:05:56.160 --> 00:06:00.400 And there is a built-in function called len, that we can call and pass in 00:06:00.400 --> 00:06:03.980 as a parameter the variable or a constant, 00:06:03.980 --> 00:06:05.940 and it will tell us how many characters. 00:06:05.940 --> 00:06:10.040 Now this banana has six characters in it that are 0 through 5. 00:06:10.040 --> 00:06:12.524 So don't get a little confused, the last character is 00:06:12.524 --> 00:06:15.750 the fifth, is sub 5, but it's also the sixth character. 00:06:15.750 --> 00:06:17.450 So the length is really the length, it's 00:06:17.450 --> 00:06:22.150 not length minus 1, okay? So len is like a built-in function. 00:06:22.150 --> 00:06:23.840 It's not a function we have to write, 00:06:23.840 --> 00:06:26.570 as we talked in chapter the functions chapter. 00:06:26.570 --> 00:06:28.626 There are things that are part of Python that are just sitting there. 00:06:28.626 --> 00:06:31.172 And so we are passing banana, the variable 00:06:31.172 --> 00:06:35.010 fruit, into function, we're passing it into function. 00:06:35.010 --> 00:06:36.590 And then, into the len function. 00:06:36.590 --> 00:06:42.250 Then len [SOUND] does magic stuff. And then out comes the answer. 00:06:42.250 --> 00:06:48.320 And that 6 replaces this and then the 6 goes into the variable x, and so x is 6. 00:06:48.320 --> 00:06:51.070 I sure made that a messy looking slide. 00:06:51.070 --> 00:06:55.080 And so, think of inside the len function, there's a def. 00:06:55.080 --> 00:06:59.890 len takes a parameter, does some loopy things, and it does its thing. 00:06:59.890 --> 00:07:02.350 So, it's a function that we might write except we don't 00:07:02.350 --> 00:07:07.160 have to because it's already written and built in to Python. 00:07:07.160 --> 00:07:10.380 Okay. So that's the length of the 00:07:10.380 --> 00:07:12.460 string, that's getting it individual characters. 00:07:12.460 --> 00:07:15.550 We can also loop through strings. 00:07:15.550 --> 00:07:18.710 Obviously, if we can use the index operator, and we 00:07:18.710 --> 00:07:21.970 can put a variable in there, we can write a loop. 00:07:21.970 --> 00:07:23.520 This is an indefinite loop. 00:07:23.520 --> 00:07:27.140 So we have this variable fruit, has the string banana in it. 00:07:27.140 --> 00:07:29.580 We set the variable index to 0. 00:07:29.580 --> 00:07:32.920 We make a little while loop. And we ask, 00:07:32.920 --> 00:07:35.460 as long as index is less than the length of fruit. 00:07:35.460 --> 00:07:37.510 Now remember, the length of fruit is going to be 6. 00:07:37.510 --> 00:07:39.520 But we don't want to make that less than or equal to 00:07:39.520 --> 00:07:43.630 because then we would crash, because the last character is 5. 00:07:43.630 --> 00:07:46.438 We can say letter is equal to fruit sub index, so that's going to 00:07:46.438 --> 00:07:50.040 start out being index of, is going to be 0, so that's fruit sub 0. 00:07:50.040 --> 00:07:53.300 Then we print index and letter, so that means the 00:07:53.300 --> 00:07:56.220 first time through the loop we're going to see 0 b. 00:07:56.220 --> 00:07:58.056 Then we increment our 00:07:58.056 --> 00:08:04.450 iteration operator, and go up. And then we come out with 1 a. 00:08:04.450 --> 00:08:13.560 And index advances until index is 6, but has printed out each of the letters. 00:08:13.560 --> 00:08:15.790 Now, we're not doing this just to 00:08:15.790 --> 00:08:18.620 print them out, we will do something a little more valuable, 00:08:21.540 --> 00:08:23.150 valuable inside that loop. 00:08:23.150 --> 00:08:28.740 But this gives the sense that we can work through a loop just like we, we, 00:08:28.740 --> 00:08:35.779 we can work through a string just like we work through a list of numbers, okay? 00:08:35.779 --> 00:08:38.630 Now, that was how you do it with an indefinite loop. 00:08:38.630 --> 00:08:42.870 In a definite loop, it's just far more awesome, okay? 00:08:42.870 --> 00:08:44.880 Just like we did with the list of numbers, 00:08:46.110 --> 00:08:49.320 Python understands strings and allows us to write 00:08:49.320 --> 00:08:53.410 for loops, using for and in, that go through the strings. 00:08:53.410 --> 00:08:56.910 So basically, for letter in fruit, now remember, I'm using letter as a 00:08:56.910 --> 00:09:01.220 mnemonic variable here, it's just a choice, a wise choice of a variable name. 00:09:01.220 --> 00:09:05.685 So that says, run this little block of text once for 00:09:05.685 --> 00:09:08.195 each letter in the variable fruit, which means that letter's going to 00:09:08.195 --> 00:09:13.959 take on the successive b-a-n-a-n-a. 00:09:13.959 --> 00:09:16.084 When I look at that I always worry that I misspelled it. 00:09:16.084 --> 00:09:18.925 I think I got these right. 00:09:18.925 --> 00:09:22.423 If I rewrite this book, I'm not going to use banana as the example because I'm 00:09:22.423 --> 00:09:24.649 terrified that I misspelled banana, because I don't 00:09:24.649 --> 00:09:27.190 know how many n's banana has in it. 00:09:27.190 --> 00:09:32.280 But, be that as it may, we are abstracting, we are letting Python say, 00:09:32.280 --> 00:09:36.300 run this little block of text once, in order, for each of the letters in 00:09:36.300 --> 00:09:40.990 the variable fruit, which is b-a-n-a, and so it prints out each of the letters. 00:09:40.990 --> 00:09:46.110 So this is a much prettier version of the, the looping, 00:09:46.110 --> 00:09:50.690 so using the definite, the for keyword instead of the while keyword. 00:09:50.690 --> 00:09:54.060 And so, we can just kind of compare these two things. 00:09:54.060 --> 00:09:55.570 They kind of do the exact same thing. 00:09:55.570 --> 00:09:57.680 And it also is kind of a, gives you a 00:09:57.680 --> 00:10:01.120 sense of what the for is doing for us, right? 00:10:01.120 --> 00:10:01.530 The for is 00:10:01.530 --> 00:10:05.100 setting up this index, the for is looking up 00:10:05.100 --> 00:10:07.890 inside of fruit, and the for is advancing the index. 00:10:07.890 --> 00:10:10.220 So the for's doing a bunch of work for us 00:10:10.220 --> 00:10:12.390 and I've characterized that, sort of, in the previous lecture. 00:10:12.390 --> 00:10:14.890 How the for is sort of doing a bunch of things for us 00:10:14.890 --> 00:10:19.508 and that's, it allows our code to be more 00:10:19.508 --> 00:10:22.500 expressive and, and instead of, so this is, a lot of 00:10:22.500 --> 00:10:26.500 this is just kind of bookkeeping crap that we don't really need. 00:10:26.500 --> 00:10:29.580 And so the for loop helps us by doing some of the bookkeeping for us. 00:10:31.920 --> 00:10:34.960 Okay, so we can do all those loops again. 00:10:34.960 --> 00:10:38.761 We can find the largest letter, the smallest letter, the, how many times. 00:10:38.761 --> 00:10:45.390 So, I think, what, how many n's are in this, or how many a's are in this. 00:10:45.390 --> 00:10:49.690 So this is a simple counting pattern and, and a looking pattern. 00:10:49.690 --> 00:10:52.720 And so, our word is banana, our count is 0. 00:10:52.720 --> 00:10:54.976 For the letter in word, again, boop, boop, 00:10:54.976 --> 00:10:56.940 boop, boop, boop, that comes out like that. 00:10:56.940 --> 00:11:01.320 So it's going to run this little block. If the letter is a, add 1 to the count. 00:11:02.330 --> 00:11:07.580 So this is going to basically print out at the end the number of a's in banana. 00:11:07.580 --> 00:11:10.360 It would probably be more useful, for me, to print out the number 00:11:10.360 --> 00:11:13.910 of n's in banana, because I never know how many n's are in banana. 00:11:13.910 --> 00:11:15.480 But it looks like there's supposed to be two, 00:11:15.480 --> 00:11:17.440 or otherwise I have a typo on this slide. 00:11:18.790 --> 00:11:21.230 So the in, again, I, I love the in. 00:11:21.230 --> 00:11:22.120 I just absolutely 00:11:22.120 --> 00:11:24.700 love this in. I love this syntax. 00:11:24.700 --> 00:11:30.760 This for each letter in the word banana. Just, to me, it reads very smoothly. 00:11:30.760 --> 00:11:33.250 Cognitively, it fits in my mind what's going on. 00:11:33.250 --> 00:11:37.110 For each letter in banana, run this little indented block of text. 00:11:37.110 --> 00:11:42.990 Again, very pretty, I love in, it's one of my favorite little pieces of Python. 00:11:46.490 --> 00:11:49.430 So, again, with the for, it takes care of 00:11:49.430 --> 00:11:52.420 all the iteration variables for us, and it goes through the sequence. 00:11:52.420 --> 00:11:54.850 And so here's, here's an animation, right? 00:11:54.850 --> 00:11:57.910 Remember that the for is going to do all this work for us, right? 00:11:57.910 --> 00:12:00.710 Letter is going to advance through the 00:12:00.710 --> 00:12:04.720 successive values, the successive letters in banana. 00:12:04.720 --> 00:12:12.090 So letter is being moved for us by the for statement, okay? 00:12:12.090 --> 00:12:14.640 So that's looping through. 00:12:14.640 --> 00:12:16.661 Now it turns out there's a lot of common things that 00:12:16.661 --> 00:12:18.730 we want to do that are already built into Python for us. 00:12:20.100 --> 00:12:24.490 Clear the screen there. We call these slicing. 00:12:24.490 --> 00:12:28.870 So the index operator looks up various things in a string, but we 00:12:28.870 --> 00:12:33.470 can also pull substrings out, using the colon in addition to the square brackets. 00:12:33.470 --> 00:12:35.020 Again, this is called slicing. 00:12:36.350 --> 00:12:37.200 So the 00:12:37.200 --> 00:12:43.010 colon operator, basically, takes a starting position, and then an ending 00:12:43.010 --> 00:12:47.798 position, but the ending position is up to but not including the second one. 00:12:47.798 --> 00:12:51.660 So this is, it's up to but not including, up to but not including. 00:12:51.660 --> 00:12:54.410 Just like the zero, you get used to it pretty quick, 00:12:54.410 --> 00:12:56.020 but the first time you see it, it's a little bit 00:12:58.240 --> 00:12:59.220 wonky. 00:12:59.220 --> 00:13:03.480 So, if we're going 0 through 4, that's how I read this print, s sub 0 00:13:03.480 --> 00:13:08.960 through 4, or, or better, better said, s 0, up to but not including 4. 00:13:08.960 --> 00:13:14.160 That is, print me out the chunk that is up to, but not including, 4. 00:13:14.160 --> 00:13:18.510 So, it doesn't include 4, and so out comes Mont, right? 00:13:19.630 --> 00:13:23.325 So the next one is 6 up to but not including 7, so it starts at 6, 00:13:23.325 --> 00:13:30.010 up to but not including 7, so out comes the P. 00:13:30.010 --> 00:13:32.080 And, even though you might expect that it 00:13:32.080 --> 00:13:35.770 would traceback on this, Python is a little forgiving. 00:13:35.770 --> 00:13:37.310 So here's a moment where Python is a little 00:13:37.310 --> 00:13:40.170 forgiving, saying, you know, I'll give you a break here. 00:13:40.170 --> 00:13:42.630 If you go 6, but up to, but not including 20, 00:13:42.630 --> 00:13:45.510 I'll just stop at the end of the string. 00:13:45.510 --> 00:13:48.702 So it's 6 to the end, so it, it, you can over-reference here and 00:13:48.702 --> 00:13:51.530 you can not, you won't get yourself in trouble. 00:13:51.530 --> 00:13:53.280 So that comes out, Python. 00:13:53.280 --> 00:13:57.680 So, again, the second character is up to but not including, 00:13:57.680 --> 00:13:59.810 and that's the, kind of the weird thing there. 00:13:59.810 --> 00:14:01.540 Of course once you remember that the first character 00:14:01.540 --> 00:14:04.590 is 0, 0 up through but not including. Nice. 00:14:08.570 --> 00:14:12.380 If we leave off the first or the last number, leaving off the first number, the 00:14:12.380 --> 00:14:17.100 last number and both of them, they mean the beginning and end of the string, 00:14:17.100 --> 00:14:23.860 respectively. And so, up to but not including 2 is M-o. 00:14:23.860 --> 00:14:30.660 8 colon means starting at 8 to the end of the string. 00:14:30.660 --> 00:14:33.730 So that's, thon. And then, that means 00:14:33.730 --> 00:14:36.970 the beginning to the end, and so it's just the whole string, Monty Python. 00:14:38.110 --> 00:14:39.833 Now we've already played with string 00:14:39.833 --> 00:14:43.010 concatenation, just a thing to emphasize here is, 00:14:43.010 --> 00:14:48.740 the concatenation operator does not add a space, does not add a space. 00:14:48.740 --> 00:14:51.950 If you want a space, you explicitly add it. 00:14:51.950 --> 00:14:55.740 So here there's no space in between the o and the t, but here 00:14:55.740 --> 00:14:59.690 there is a space between the o and the t because we explicitly added it. 00:14:59.690 --> 00:15:02.280 So you can concatenate more than one thing. 00:15:02.280 --> 00:15:05.360 And you add your spaces as you want, okay? 00:15:08.000 --> 00:15:10.490 Another thing you can do is you can ask questions about a string. 00:15:10.490 --> 00:15:14.520 Sort of like doing a string lookup, using the in operator. 00:15:14.520 --> 00:15:17.790 This is a little different than how we use it inside of a for loop. 00:15:17.790 --> 00:15:20.690 This is a logical operation asking a question 00:15:20.690 --> 00:15:23.220 like less than or greater than or whatever. 00:15:23.220 --> 00:15:25.100 So, here's an expression. 00:15:25.100 --> 00:15:28.670 So fruit is banana, as always. Is n in fruit? 00:15:30.250 --> 00:15:33.020 And the answer is yes it is, True. So this 00:15:33.020 --> 00:15:35.050 is a logical operation. It's a question. 00:15:35.050 --> 00:15:36.620 It's a true or false. 00:15:36.620 --> 00:15:39.830 Is m in fruit? No, False. 00:15:39.830 --> 00:15:42.500 And you can, this can be a string, not just a single character. 00:15:42.500 --> 00:15:45.260 Is n-a-n in fruit? The answer is True. 00:15:45.260 --> 00:15:50.250 And you can put, sort of, you know, if, parts of ifs, et cetera, et cetera. 00:15:50.250 --> 00:15:53.500 So, this is a logical expression that can be on an if, 00:15:53.500 --> 00:15:57.100 you can have a while, et cetera, et cetera, et cetera. 00:15:57.100 --> 00:15:58.410 So it's a logical, 00:15:58.410 --> 00:16:00.670 logical expression and it returns True or False. 00:16:03.540 --> 00:16:05.560 You can also do comparisons. 00:16:05.560 --> 00:16:11.190 Now, the comparison operations, equals makes a lot of sense, less 00:16:11.190 --> 00:16:15.450 than and greater than depend on the language that you're using Python. 00:16:15.450 --> 00:16:20.204 And so, if you're using, like, a Latin character set, then alphabetical matters. 00:16:20.204 --> 00:16:22.480 You know, the, the way the Latin character set would do. 00:16:22.480 --> 00:16:24.380 But if you're in a different character set, Python is 00:16:24.380 --> 00:16:28.890 aware of multiple character sets and will sort strings based on 00:16:28.890 --> 00:16:32.050 the sorting algorithm of the particular character set. 00:16:33.160 --> 00:16:37.610 So you can do comparisons like equality, less than, and greater than. 00:16:37.610 --> 00:16:39.830 And we've seen some of these things in previous lectures, actually. 00:16:39.830 --> 00:16:40.650 We've had to use them. 00:16:42.080 --> 00:16:47.125 So in addition, to, sort of, these sort of fundamental operations that we 00:16:47.125 --> 00:16:54.263 can do on strings, there's a extensive library of built-in capabilities 00:16:54.263 --> 00:16:55.308 in Python. 00:16:55.308 --> 00:16:59.283 And so the, the way we see these built-in capabilities 00:16:59.283 --> 00:17:03.320 are they're, they're actually sort of built in to strings. 00:17:03.320 --> 00:17:05.760 So, let's go real slow here. 00:17:05.760 --> 00:17:07.310 Here we have a variable called greet and 00:17:07.310 --> 00:17:10.050 we're sticking the string Hello Bob into it. 00:17:10.050 --> 00:17:12.619 Now greet is of type string, as a result 00:17:12.619 --> 00:17:16.589 of this, and it contains Hello Bob as its value. 00:17:16.589 --> 00:17:18.296 But we can actually access 00:17:18.296 --> 00:17:26.559 capabilities inside of this value. So we can say, greet.lower(). 00:17:26.559 --> 00:17:30.650 This is calling something that's part of greet itself, it's a part of all strings. 00:17:30.650 --> 00:17:34.660 The fact that greet contains a string, means that we can ask for, 00:17:34.660 --> 00:17:38.120 hey, give me greet, which just gives you back what you're looking for. 00:17:38.120 --> 00:17:40.980 Like here, print greet is Hello Bob. 00:17:40.980 --> 00:17:45.500 Or you can say give me greet lower, so this is giving me a lowercase copy. 00:17:45.500 --> 00:17:51.030 It doesn't convert it to lowercase. It gives me a lowercase copy of Hello Bob. 00:17:51.030 --> 00:17:53.580 So zap is hello bob, all lowercase. 00:17:54.660 --> 00:17:59.950 Now, it didn't change greet, right? And, you can even put this .lower on the 00:17:59.950 --> 00:18:05.280 end of constants so, why you'd do this, I don't know, but Hi There, with H and T capitalized, 00:18:05.280 --> 00:18:10.640 .lower comes out as hi there. So this bit is part of 00:18:10.640 --> 00:18:11.560 all strings. 00:18:11.560 --> 00:18:17.900 Both variables and constants have these string functions built into them. 00:18:17.900 --> 00:18:21.120 And every instance of a string, whether it 00:18:21.120 --> 00:18:23.720 be a variable or a constant, has these capabilities. 00:18:23.720 --> 00:18:28.150 They don't modify it, they just give you back a copy. 00:18:28.150 --> 00:18:31.500 Now it turns out there is a, a 00:18:31.500 --> 00:18:36.170 command inside Python called dir, to ask questions like 00:18:36.170 --> 00:18:39.730 hey, well here's, you know, stuff has got Hello World. 00:18:39.730 --> 00:18:42.964 We can say. Redo this. 00:18:42.964 --> 00:18:45.560 Come here. 00:18:45.560 --> 00:18:48.240 Stuff is a string. We can ask, hey, what are you? 00:18:48.240 --> 00:18:49.660 I am a string. 00:18:49.660 --> 00:18:53.820 dir is another built-in Python that asks the question, hey, what are all 00:18:53.820 --> 00:18:56.640 the things that are built into this that I can make use of? 00:18:56.640 --> 00:18:57.780 And here they are. 00:18:57.780 --> 00:19:01.250 That's kind of a raw dump of them. You can also go look at 00:19:01.250 --> 00:19:05.910 the online documentation for Python and see at the Pyth, oop, at 00:19:05.910 --> 00:19:09.670 the Python website, you can see a whole bunch of these things. 00:19:09.670 --> 00:19:13.690 And they have the calling sequence, what the parameters are, et cetera. 00:19:13.690 --> 00:19:17.800 So when you're looking these things up, you can go, go read about them. 00:19:17.800 --> 00:19:19.140 Here's just a few that I've pulled out, 00:19:19.140 --> 00:19:23.200 capitalize, which uppercases the first characters, 00:19:23.200 --> 00:19:27.220 center, endswith, find, there's stripping. 00:19:27.220 --> 00:19:28.300 So I'll look through a couple of these, 00:19:28.300 --> 00:19:30.740 just the kind of things to be looking for. 00:19:30.740 --> 00:19:33.780 It'll be a good idea to take a look and read through some of the things. 00:19:33.780 --> 00:19:37.540 Here's a couple that, that we'll probably be using early on. 00:19:37.540 --> 00:19:43.700 The find function, it's similar to in but it tells you where it finds the, the 00:19:43.700 --> 00:19:49.517 particular thing that it's looking for. And and so we'll put fruit is banana. 00:19:49.517 --> 00:19:52.379 And I'm going to say pos, which is going to be an integer variable, 00:19:52.379 --> 00:19:54.002 equals fruit.find("na"). 00:19:54.002 --> 00:19:57.836 So what it's saying is, go look inside this variable fruit, 00:19:57.836 --> 00:20:01.551 hunt until you find the first occurrence of the string na. 00:20:01.551 --> 00:20:05.590 Hunt, hunt, hunt, hunt, whoop, got it. And then return it to me. 00:20:05.590 --> 00:20:10.580 So that's going to give me back 2. 2 is where it found it, right? 00:20:10.580 --> 00:20:14.120 So, where is it in the string, that's what find does. 00:20:14.120 --> 00:20:16.920 And if you don't find anything, like you're looking for z, 00:20:16.920 --> 00:20:21.440 no, no, no, I didn't find a z, then it gives me back negative 1. 00:20:21.440 --> 00:20:27.270 So just, again, this is just one of many built-in functions in string. 00:20:27.270 --> 00:20:30.130 The ability to find a substring, okay? 00:20:30.130 --> 00:20:33.090 Or find, yeah, find a character or string within another string. 00:20:35.330 --> 00:20:37.110 There's a lower case, there's also an 00:20:37.110 --> 00:20:40.710 upper case, This might be better named shouting. 00:20:40.710 --> 00:20:44.070 Upper means give me an uppercase copy of this variable. 00:20:44.070 --> 00:20:49.730 So Hello Bob becomes HELLO BOB, and then lower is hello bob, right? 00:20:49.730 --> 00:20:55.920 So these are both ways to get copies of uppercase and lowercase versions. 00:20:55.920 --> 00:20:58.438 You might think these are kind of silly, but one of the things 00:20:58.438 --> 00:21:01.450 that you tend to use lower for is if you're doing searching and 00:21:01.450 --> 00:21:03.700 you want to ignore case, you convert the whole thing 00:21:03.700 --> 00:21:06.382 to lowercase, and then you search for a lowercase string. 00:21:06.382 --> 00:21:08.712 So you, depends on if you want to ignore case or not. 00:21:08.712 --> 00:21:11.720 So that's, that's one of the reasons that you have things like this. 00:21:14.280 --> 00:21:19.224 There is a replace function. Again, it doesn't change the value. 00:21:19.224 --> 00:21:21.640 Greet is going to have Hello Bob. 00:21:21.640 --> 00:21:28.350 And I'm going to say, greet.replace all occurrences of Bob with Jane. 00:21:28.350 --> 00:21:32.660 That gives me back a copy, in nstr, says Hello Jane. 00:21:32.660 --> 00:21:35.690 So, so greet is unchanged. 00:21:35.690 --> 00:21:39.890 This replace says, make a copy and then make that following 00:21:39.890 --> 00:21:43.251 edit that you, that, that we've requested. 00:21:43.251 --> 00:21:46.447 [COUGH] Now we can also say, well, I mean, the replace 00:21:46.447 --> 00:21:50.490 is going to do all occurrences, so greet is still Hello Bob. 00:21:50.490 --> 00:21:51.660 This is kind of redundant here. 00:21:51.660 --> 00:21:53.980 I'm just doing it so you remember what it is. 00:21:53.980 --> 00:21:55.310 Greet is still Hello Bob. 00:21:55.310 --> 00:21:57.500 I put Hello Bob back in it and replace 00:21:57.500 --> 00:22:00.850 all the occurrences of lowercase o with uppercase X. 00:22:01.920 --> 00:22:05.096 And then that happens. So this says, 00:22:05.096 --> 00:22:11.927 go through the whole string [SOUND] doing all those replaces, okay? 00:22:11.927 --> 00:22:14.237 Now another common thing that we're going to have to do 00:22:14.237 --> 00:22:16.901 is we're going to have to throw away whitespace. 00:22:16.901 --> 00:22:18.628 Sometimes you have a string that 00:22:18.629 --> 00:22:21.893 has characters, blank characters, or other characters, 00:22:21.893 --> 00:22:26.328 at the beginning and the end, nonprintable characters, and we can strip them. 00:22:26.328 --> 00:22:30.458 And there's three charact, three functions that are built into 00:22:30.458 --> 00:22:32.840 to Python strings that do this for us. 00:22:33.920 --> 00:22:38.202 There is lstrip, which strips from the left. 00:22:38.202 --> 00:22:43.675 There is rstrip, which strips from the right. 00:22:43.675 --> 00:22:47.440 So it throws away these whitespaces, so, Hello Bob is gone. 00:22:48.470 --> 00:22:50.940 I mean, the, so it gets rid of these characters. 00:22:50.940 --> 00:22:53.373 Oops, these are the ones that are gotten rid of there. 00:22:53.373 --> 00:22:55.913 I need an eraser. And then 00:22:55.913 --> 00:22:59.313 let's use white, and then strip basically, gets rid of 00:22:59.313 --> 00:23:03.250 all the whitespace, both on the left and the right side. 00:23:03.250 --> 00:23:04.140 And gets rid of that. 00:23:04.140 --> 00:23:07.010 So we're going to, we're going to be using these a lot. 00:23:07.010 --> 00:23:09.860 It, one of the things you tend to do in Python is cleaning up data. 00:23:09.860 --> 00:23:11.790 Sometimes if you have spaces at the beginning or 00:23:11.790 --> 00:23:13.960 the end, you just want to kind of ignore them. 00:23:13.960 --> 00:23:15.790 So you strip them off, you throw them away. 00:23:18.020 --> 00:23:22.130 When we're looking for data, we sometimes are looking for a prefix, and 00:23:22.130 --> 00:23:27.400 there is a startswith function [COUGH] that gives you a true or a false. 00:23:27.400 --> 00:23:31.370 We're asking here, does this variable line start with the string Please. 00:23:31.370 --> 00:23:34.820 And the answer is True, because it does start with the string Please. 00:23:34.820 --> 00:23:38.290 Or, and then next, we ask, does this start with the letter p? 00:23:38.290 --> 00:23:41.060 And the answer is False, it does not start with the letter p. 00:23:42.070 --> 00:23:43.290 Okay? So there's 00:23:43.290 --> 00:23:44.880 lots more of these things. 00:23:48.480 --> 00:23:52.704 And reading data and tearing it apart is one of the things that we're going to 00:23:52.704 --> 00:23:57.296 really focus on for the rest of these first few chapters of the book, okay? 00:23:57.296 --> 00:24:00.041 Because that's one thing that Python's really good at is 00:24:00.041 --> 00:24:03.860 tearing data into pieces and pulling the pieces that you want. 00:24:03.860 --> 00:24:06.840 So, so let's take a look at this line. 00:24:06.840 --> 00:24:11.455 So this line that we've got here is a line from an actual email box. 00:24:11.455 --> 00:24:13.550 This is what, if you 00:24:13.550 --> 00:24:15.580 looked at your email, sort of, on your hard 00:24:15.580 --> 00:24:18.710 drive, email boxes would have this kind of a format. 00:24:18.710 --> 00:24:23.870 And there's actually many lines, and soon we'll reading whole files full of email. 00:24:23.870 --> 00:24:26.940 But for now, let's just say we've got this one line, somehow. 00:24:26.940 --> 00:24:29.400 And we're looking for, we don't know how long 00:24:29.400 --> 00:24:31.910 these things are going to be, the first charac, the 00:24:31.910 --> 00:24:34.520 first thing is from, then there's an email address, 00:24:34.520 --> 00:24:38.000 then there's some detail about when the mail was sent. 00:24:38.000 --> 00:24:40.550 But what we actually want is 00:24:40.550 --> 00:24:42.450 we want this part right here, 00:24:42.450 --> 00:24:45.910 and that's the domain name of the mail address, right? 00:24:45.910 --> 00:24:48.110 We want to extract this out. 00:24:48.110 --> 00:24:52.780 We're faced with this line, in a variable, and we want to extract that out. 00:24:52.780 --> 00:24:55.680 So this is kind of putting all these things together. 00:24:55.680 --> 00:24:59.330 So let's walk through how we do this. 00:24:59.330 --> 00:25:02.028 So, here's this line, and it's a big long string. 00:25:02.028 --> 00:25:03.950 Mostly we would've read this from a file, 00:25:03.950 --> 00:25:05.870 rather than just put it in a constant, but for now we 00:25:05.870 --> 00:25:08.480 put it in a constant, because we, files is the next chapter. 00:25:09.950 --> 00:25:12.500 And so what we're going to do is we're going to say, you 00:25:12.500 --> 00:25:15.380 know what, I'm going to look at this line and I'm going to go 00:25:15.380 --> 00:25:18.048 find the @ sign, and I want to know where the @ sign is. 00:25:18.048 --> 00:25:24.330 So I call data.find @ sign, and put the result in atpos. 00:25:24.330 --> 00:25:26.514 And that gives me 21. 00:25:26.514 --> 00:25:29.166 It hunts until it finds the @ sign, and 00:25:29.166 --> 00:25:34.310 then tells me where I found it. Then what I want to look at is, starting 00:25:34.310 --> 00:25:39.200 here, for the rest of the string, I want to find the first space afterwards. 00:25:40.250 --> 00:25:45.868 So what I say is, this, sppos is my variable for the position of the space, 00:25:45.868 --> 00:25:51.132 data.find, a blank, starting at the @. 00:25:51.132 --> 00:25:54.216 So this is starting at 21. So it says, I'll start 00:25:54.216 --> 00:25:59.523 at 21 and I'll look for the next blank. And I find that at 31. 00:25:59.523 --> 00:26:05.350 So now I know where the @ sign is and I know where the space is. 00:26:05.350 --> 00:26:08.172 And so what I'm looking at is, I want the stuff 00:26:08.172 --> 00:26:14.186 one beyond the @ sign, up to but not including the space. 00:26:14.186 --> 00:26:20.142 So then I can use a slicing operation, I can use a slicing operation. 00:26:20.142 --> 00:26:22.650 Start at the @ position, add 1 to it, 00:26:22.650 --> 00:26:26.480 so advance 1, that's going to be the letter u. 00:26:26.480 --> 00:26:30.730 And then a slicing operation, up to but not including space. 00:26:30.730 --> 00:26:36.190 Up to, this is going to work out nicely all of a sudden, but not 00:26:36.190 --> 00:26:41.770 including, okay? And then 00:26:41.770 --> 00:26:45.796 I'm going to take that slice, which is really this little bit of data right here, 00:26:45.796 --> 00:26:49.500 take that slice, and put in the variable host. 00:26:49.500 --> 00:26:53.844 Then we print that out and we get the piece, okay? 00:26:53.844 --> 00:26:56.980 And so, here we have some data we want to tear apart. 00:26:56.980 --> 00:26:58.230 We hunt for the @. 00:26:58.230 --> 00:27:00.281 We find it at position 21. 00:27:00.281 --> 00:27:04.598 We start at 21 and we look for the, the space after that. 00:27:04.598 --> 00:27:10.659 31, and then we pull from 22, up to but not including, 31. 00:27:10.659 --> 00:27:13.380 And it, it wouldn't matter where this thing was, because these aren't all 00:27:13.380 --> 00:27:17.491 the same length when we start looking at them in files, but it 00:27:17.491 --> 00:27:20.541 would have found the @ sign and the space after the @ sign, 00:27:20.541 --> 00:27:24.258 and it would have reliably pulled out the host, okay? 00:27:24.258 --> 00:27:29.646 So this is a basic pattern we call parsing. 00:27:29.646 --> 00:27:32.068 Parsing text. 00:27:32.068 --> 00:27:35.620 Find this, find that other thing, grab this thing out, 00:27:35.620 --> 00:27:40.040 then look inside that thing and [SOUND]. So it does all these things, right? 00:27:40.040 --> 00:27:45.430 So, that's kind of like strings. Up next, we have files. 00:27:45.430 --> 00:27:46.770 Files are going to be lots of strings. 00:27:46.770 --> 00:27:49.320 So we're going to start putting all these things together. 00:27:49.320 --> 00:27:52.490 And and so the next chapter is a really, really 00:27:52.490 --> 00:27:55.600 important chapter, where it starts to really start coming together. 00:27:55.600 --> 00:27:57.110 So see you soon.