1 00:00:00,200 --> 00:00:02,820 Hello, and welcome to Chapter Six. 2 00:00:02,820 --> 00:00:05,240 This chapter we're going to talk about strings, and 3 00:00:05,240 --> 00:00:08,610 stuff is going to start to get real now. 4 00:00:08,610 --> 00:00:12,610 So, as always, this material, this video, these 5 00:00:12,610 --> 00:00:15,550 slides and book are copyright Creative Commons Attribution. 6 00:00:15,550 --> 00:00:16,870 I want you to use these materials. 7 00:00:16,870 --> 00:00:18,800 I want you to, somebody else, I want to 8 00:00:18,800 --> 00:00:21,720 make more teachers, so everyone can teach this stuff. 9 00:00:21,720 --> 00:00:22,790 Use it however you like. 10 00:00:24,010 --> 00:00:25,280 Okay, so we've been playing with 11 00:00:25,280 --> 00:00:26,730 strings from the beginning. 12 00:00:26,730 --> 00:00:28,320 I mean, literally, if we didn't work 13 00:00:28,320 --> 00:00:31,040 with strings, we could've never printed Hello World. 14 00:00:31,040 --> 00:00:35,813 And, and lord knows, we need to print Hello World in a programming language. 15 00:00:35,813 --> 00:00:39,610 And so, we've been using them, especially constants. 16 00:00:39,610 --> 00:00:41,650 Now, in this chapter, we're going to dig in. 17 00:00:41,650 --> 00:00:46,986 So, oops, so a string is a sequence of characters. 18 00:00:46,986 --> 00:00:50,408 You can use either use single quotes or double quotes in Python 19 00:00:50,408 --> 00:00:51,360 to delimit a string. 20 00:00:51,360 --> 00:00:54,528 And so here's two string constants, Hello and there, 21 00:00:54,528 --> 00:00:58,140 and stuck into the variables str1 and str2. 22 00:00:58,140 --> 00:01:00,520 We can concatenate them together with a plus sign. 23 00:01:00,520 --> 00:01:03,100 Python is smart enough to look and say, 24 00:01:03,100 --> 00:01:05,694 oh, those are strings, I know what to do with those. 25 00:01:05,694 --> 00:01:09,588 And you'll notice that the plus doesn't add any space here, because when 26 00:01:09,588 --> 00:01:13,566 we print bob out, Hello and there are right next to one another. 27 00:01:13,566 --> 00:01:17,014 If, for example, we've done some conversions, 28 00:01:17,014 --> 00:01:18,799 so when we were, like, reading pay, 29 00:01:18,799 --> 00:01:20,640 and rate, and hours, and stuff, we've done some conversions. 30 00:01:20,640 --> 00:01:23,378 So this is an example of the, a string 1 2 3 31 00:01:23,378 --> 00:01:27,211 Not 123, but the string, quote 1 2 3 quote. 32 00:01:27,211 --> 00:01:29,270 And we can't add 1 to this, we get 33 00:01:29,270 --> 00:01:32,910 a traceback, kind of, at this point, as we expected. 34 00:01:32,910 --> 00:01:37,020 And we would convert that to an integer using the int function that's built in. 35 00:01:37,020 --> 00:01:39,900 See how much Python you already know? I mean, this is awesome, right? 36 00:01:39,900 --> 00:01:40,970 I can just say, 37 00:01:40,970 --> 00:01:42,800 oh, you call the int function, and you know what that is. 38 00:01:42,800 --> 00:01:46,220 That's, sorry, sorry, I'm just awesomed out. 39 00:01:46,220 --> 00:01:50,800 So you convert this to an integer, and then you add 1 to it, and then we get 124. 40 00:01:50,800 --> 00:01:52,270 So, there you go. 41 00:01:52,270 --> 00:01:54,740 We've been doing strings all along, had to. 42 00:01:54,740 --> 00:01:57,000 I mean, literally, strings and numeric data 43 00:01:57,000 --> 00:01:59,930 are the two things that programs deal with. 44 00:01:59,930 --> 00:02:03,120 So, we've been reading and converting. 45 00:02:03,120 --> 00:02:05,175 Again, this is sort of a pattern from some of the earlier programs 46 00:02:05,175 --> 00:02:08,661 where we do a raw input, you know? 47 00:02:08,661 --> 00:02:10,887 And the raw input just takes a string and puts it in a variable. 48 00:02:10,887 --> 00:02:14,560 So if I take Chuck, then the variable contains the string C-h-u-c-k. 49 00:02:15,990 --> 00:02:18,970 Even if we type numbers, that is a string. 50 00:02:18,970 --> 00:02:23,660 We can't, just because I put 1 0 0 in, I still can't subtract 10. 51 00:02:23,660 --> 00:02:28,270 We get a happy little traceback, oh, happy little, sad-faced traceback. 52 00:02:28,270 --> 00:02:31,294 And, and, but of course, if we convert it 53 00:02:31,294 --> 00:02:34,050 into float or something like that. 54 00:02:35,190 --> 00:02:38,670 We convert int or float, we can do that and subtract 10, and we can do that. 55 00:02:38,670 --> 00:02:41,680 So, so we've been doing this for a while. 56 00:02:41,680 --> 00:02:45,130 We've been doing strings and manipulating strings and converting strings all along. 57 00:02:45,130 --> 00:02:49,051 So the thing we're going to start doing now is we're going to dive into strings. 58 00:02:49,051 --> 00:02:53,098 We realize that strings are addressable at a character-by-character basis, 59 00:02:53,098 --> 00:02:56,350 and we can do all kind of cool things with that. 60 00:02:56,350 --> 00:02:59,998 And so, a string is a sequence of characters, and we 61 00:02:59,998 --> 00:03:04,450 can look inside them using what we call the index operator, 62 00:03:04,450 --> 00:03:06,720 the square brackets. And we've seen square brackets in 63 00:03:06,720 --> 00:03:08,230 lists, and you'll see that there's sort of 64 00:03:08,230 --> 00:03:11,610 similarities between lists of numbers, and, in effect, a 65 00:03:11,610 --> 00:03:14,350 string is a special kind of list of characters. 66 00:03:14,350 --> 00:03:17,197 So if we take this string banana, 67 00:03:17,197 --> 00:03:21,242 the string banana starts, the first character starts at 0. 68 00:03:21,242 --> 00:03:24,891 So, we call this operator sub, so letter equals 69 00:03:24,891 --> 00:03:28,383 fruit sub 1 and that is the second character. 70 00:03:28,383 --> 00:03:30,603 Now this may seem a little weird that the first character 71 00:03:30,603 --> 00:03:33,956 is a 0 and the second character is a 1. 72 00:03:33,956 --> 00:03:38,500 It actually is kind of similar to the old elevator thing, where in Europe they're 73 00:03:38,500 --> 00:03:41,124 called, the first floor is zero, then negative one, 74 00:03:41,124 --> 00:03:43,558 and the second floor is one, right? 75 00:03:43,558 --> 00:03:46,093 It's kind of the same thing. Actually, it turns out that 76 00:03:46,093 --> 00:03:50,456 internally zero was a better way to start than one. 77 00:03:50,456 --> 00:03:54,156 It, you'll get used to it and then after a while there's 78 00:03:54,156 --> 00:03:58,540 some little cool advantages to it, but for now, beginning is zero. 79 00:03:58,540 --> 00:04:01,939 Just, beginning is zero, it is the rule, just remember it. 80 00:04:02,970 --> 00:04:08,790 Okay, so the 0 is b, the 1 is a, the 2 is n, et cetera, et cetera. 81 00:04:08,790 --> 00:04:11,160 And we call this syntax 82 00:04:11,160 --> 00:04:12,540 fruit sub 1, okay? 83 00:04:12,540 --> 00:04:17,123 So that is the sub 1 character of fruit, and then that is an a. 84 00:04:17,123 --> 00:04:21,250 So that fruit sub 1 says, look up in banana, find the 1 position, 85 00:04:21,250 --> 00:04:25,870 and give me what's in that 1 position, that's what's the sub. 86 00:04:25,870 --> 00:04:29,570 And what's inside these brackets can be an expression. 87 00:04:29,570 --> 00:04:33,690 So if we set n to 3, n minus 1, well that'll compute to 2. 88 00:04:33,690 --> 00:04:36,660 And then fruit sub 2 is the letter n, 89 00:04:36,660 --> 00:04:39,979 right? So that's fruit sub 2, okay? 90 00:04:39,979 --> 00:04:42,320 It's the third character, fruit sub 2. 91 00:04:42,320 --> 00:04:47,336 So the index starts at 0, the, we read the brackets as sub, fruit sub 1, 92 00:04:47,336 --> 00:04:52,750 fruit sub 2. Now, Python will 93 00:04:52,750 --> 00:04:57,860 complain to you if you use this sub operator too far down the string. 94 00:04:57,860 --> 00:05:01,316 Here is a character with 3, which is 0, 1, and 2. 95 00:05:01,316 --> 00:05:05,420 And if we go to sub 5, it blows up. 96 00:05:05,420 --> 00:05:10,260 Now, you know, by now I hope that you're not freaking out about traceback errors. 97 00:05:10,260 --> 00:05:14,070 Remember, traceback errors are just Python trying to inform you. 98 00:05:14,070 --> 00:05:18,930 And if we just stop looking at that as mean Python face, and 99 00:05:18,930 --> 00:05:24,190 instead look at that as, oh, index error, string index out of range. 100 00:05:24,190 --> 00:05:27,360 Oh yeah, I stuck a five in there and there's only three, oh, 101 00:05:27,360 --> 00:05:31,330 my bad, thank you, Python, appreciate it, thanks for the help. 102 00:05:31,330 --> 00:05:34,870 So, think of this as like, it's not a smiley face 103 00:05:34,870 --> 00:05:38,690 but it's kind of like a, a quizzical face, right, it's like [SOUND]. 104 00:05:38,690 --> 00:05:39,660 I don't know. 105 00:05:39,660 --> 00:05:42,950 Python's confused and it's trying to tell you what it's confused, okay? 106 00:05:42,950 --> 00:05:46,780 So don't look at these as sad faces. Python doesn't hate you, Python loves you. 107 00:05:48,170 --> 00:05:52,420 And loves me too. So, strings have individual 108 00:05:52,420 --> 00:05:54,420 characters that we can address with the index operator. 109 00:05:54,420 --> 00:05:56,160 They also have length. 110 00:05:56,160 --> 00:06:00,400 And there is a built-in function called len, that we can call and pass in 111 00:06:00,400 --> 00:06:03,980 as a parameter the variable or a constant, 112 00:06:03,980 --> 00:06:05,940 and it will tell us how many characters. 113 00:06:05,940 --> 00:06:10,040 Now this banana has six characters in it that are 0 through 5. 114 00:06:10,040 --> 00:06:12,524 So don't get a little confused, the last character is 115 00:06:12,524 --> 00:06:15,750 the fifth, is sub 5, but it's also the sixth character. 116 00:06:15,750 --> 00:06:17,450 So the length is really the length, it's 117 00:06:17,450 --> 00:06:22,150 not length minus 1, okay? So len is like a built-in function. 118 00:06:22,150 --> 00:06:23,840 It's not a function we have to write, 119 00:06:23,840 --> 00:06:26,570 as we talked in chapter the functions chapter. 120 00:06:26,570 --> 00:06:28,626 There are things that are part of Python that are just sitting there. 121 00:06:28,626 --> 00:06:31,172 And so we are passing banana, the variable 122 00:06:31,172 --> 00:06:35,010 fruit, into function, we're passing it into function. 123 00:06:35,010 --> 00:06:36,590 And then, into the len function. 124 00:06:36,590 --> 00:06:42,250 Then len [SOUND] does magic stuff. And then out comes the answer. 125 00:06:42,250 --> 00:06:48,320 And that 6 replaces this and then the 6 goes into the variable x, and so x is 6. 126 00:06:48,320 --> 00:06:51,070 I sure made that a messy looking slide. 127 00:06:51,070 --> 00:06:55,080 And so, think of inside the len function, there's a def. 128 00:06:55,080 --> 00:06:59,890 len takes a parameter, does some loopy things, and it does its thing. 129 00:06:59,890 --> 00:07:02,350 So, it's a function that we might write except we don't 130 00:07:02,350 --> 00:07:07,160 have to because it's already written and built in to Python. 131 00:07:07,160 --> 00:07:10,380 Okay. So that's the length of the 132 00:07:10,380 --> 00:07:12,460 string, that's getting it individual characters. 133 00:07:12,460 --> 00:07:15,550 We can also loop through strings. 134 00:07:15,550 --> 00:07:18,710 Obviously, if we can use the index operator, and we 135 00:07:18,710 --> 00:07:21,970 can put a variable in there, we can write a loop. 136 00:07:21,970 --> 00:07:23,520 This is an indefinite loop. 137 00:07:23,520 --> 00:07:27,140 So we have this variable fruit, has the string banana in it. 138 00:07:27,140 --> 00:07:29,580 We set the variable index to 0. 139 00:07:29,580 --> 00:07:32,920 We make a little while loop. And we ask, 140 00:07:32,920 --> 00:07:35,460 as long as index is less than the length of fruit. 141 00:07:35,460 --> 00:07:37,510 Now remember, the length of fruit is going to be 6. 142 00:07:37,510 --> 00:07:39,520 But we don't want to make that less than or equal to 143 00:07:39,520 --> 00:07:43,630 because then we would crash, because the last character is 5. 144 00:07:43,630 --> 00:07:46,438 We can say letter is equal to fruit sub index, so that's going to 145 00:07:46,438 --> 00:07:50,040 start out being index of, is going to be 0, so that's fruit sub 0. 146 00:07:50,040 --> 00:07:53,300 Then we print index and letter, so that means the 147 00:07:53,300 --> 00:07:56,220 first time through the loop we're going to see 0 b. 148 00:07:56,220 --> 00:07:58,056 Then we increment our 149 00:07:58,056 --> 00:08:04,450 iteration operator, and go up. And then we come out with 1 a. 150 00:08:04,450 --> 00:08:13,560 And index advances until index is 6, but has printed out each of the letters. 151 00:08:13,560 --> 00:08:15,790 Now, we're not doing this just to 152 00:08:15,790 --> 00:08:18,620 print them out, we will do something a little more valuable, 153 00:08:21,540 --> 00:08:23,150 valuable inside that loop. 154 00:08:23,150 --> 00:08:28,740 But this gives the sense that we can work through a loop just like we, we, 155 00:08:28,740 --> 00:08:35,779 we can work through a string just like we work through a list of numbers, okay? 156 00:08:35,779 --> 00:08:38,630 Now, that was how you do it with an indefinite loop. 157 00:08:38,630 --> 00:08:42,870 In a definite loop, it's just far more awesome, okay? 158 00:08:42,870 --> 00:08:44,880 Just like we did with the list of numbers, 159 00:08:46,110 --> 00:08:49,320 Python understands strings and allows us to write 160 00:08:49,320 --> 00:08:53,410 for loops, using for and in, that go through the strings. 161 00:08:53,410 --> 00:08:56,910 So basically, for letter in fruit, now remember, I'm using letter as a 162 00:08:56,910 --> 00:09:01,220 mnemonic variable here, it's just a choice, a wise choice of a variable name. 163 00:09:01,220 --> 00:09:05,685 So that says, run this little block of text once for 164 00:09:05,685 --> 00:09:08,195 each letter in the variable fruit, which means that letter's going to 165 00:09:08,195 --> 00:09:13,959 take on the successive b-a-n-a-n-a. 166 00:09:13,959 --> 00:09:16,084 When I look at that I always worry that I misspelled it. 167 00:09:16,084 --> 00:09:18,925 I think I got these right. 168 00:09:18,925 --> 00:09:22,423 If I rewrite this book, I'm not going to use banana as the example because I'm 169 00:09:22,423 --> 00:09:24,649 terrified that I misspelled banana, because I don't 170 00:09:24,649 --> 00:09:27,190 know how many n's banana has in it. 171 00:09:27,190 --> 00:09:32,280 But, be that as it may, we are abstracting, we are letting Python say, 172 00:09:32,280 --> 00:09:36,300 run this little block of text once, in order, for each of the letters in 173 00:09:36,300 --> 00:09:40,990 the variable fruit, which is b-a-n-a, and so it prints out each of the letters. 174 00:09:40,990 --> 00:09:46,110 So this is a much prettier version of the, the looping, 175 00:09:46,110 --> 00:09:50,690 so using the definite, the for keyword instead of the while keyword. 176 00:09:50,690 --> 00:09:54,060 And so, we can just kind of compare these two things. 177 00:09:54,060 --> 00:09:55,570 They kind of do the exact same thing. 178 00:09:55,570 --> 00:09:57,680 And it also is kind of a, gives you a 179 00:09:57,680 --> 00:10:01,120 sense of what the for is doing for us, right? 180 00:10:01,120 --> 00:10:01,530 The for is 181 00:10:01,530 --> 00:10:05,100 setting up this index, the for is looking up 182 00:10:05,100 --> 00:10:07,890 inside of fruit, and the for is advancing the index. 183 00:10:07,890 --> 00:10:10,220 So the for's doing a bunch of work for us 184 00:10:10,220 --> 00:10:12,390 and I've characterized that, sort of, in the previous lecture. 185 00:10:12,390 --> 00:10:14,890 How the for is sort of doing a bunch of things for us 186 00:10:14,890 --> 00:10:19,508 and that's, it allows our code to be more 187 00:10:19,508 --> 00:10:22,500 expressive and, and instead of, so this is, a lot of 188 00:10:22,500 --> 00:10:26,500 this is just kind of bookkeeping crap that we don't really need. 189 00:10:26,500 --> 00:10:29,580 And so the for loop helps us by doing some of the bookkeeping for us. 190 00:10:31,920 --> 00:10:34,960 Okay, so we can do all those loops again. 191 00:10:34,960 --> 00:10:38,761 We can find the largest letter, the smallest letter, the, how many times. 192 00:10:38,761 --> 00:10:45,390 So, I think, what, how many n's are in this, or how many a's are in this. 193 00:10:45,390 --> 00:10:49,690 So this is a simple counting pattern and, and a looking pattern. 194 00:10:49,690 --> 00:10:52,720 And so, our word is banana, our count is 0. 195 00:10:52,720 --> 00:10:54,976 For the letter in word, again, boop, boop, 196 00:10:54,976 --> 00:10:56,940 boop, boop, boop, that comes out like that. 197 00:10:56,940 --> 00:11:01,320 So it's going to run this little block. If the letter is a, add 1 to the count. 198 00:11:02,330 --> 00:11:07,580 So this is going to basically print out at the end the number of a's in banana. 199 00:11:07,580 --> 00:11:10,360 It would probably be more useful, for me, to print out the number 200 00:11:10,360 --> 00:11:13,910 of n's in banana, because I never know how many n's are in banana. 201 00:11:13,910 --> 00:11:15,480 But it looks like there's supposed to be two, 202 00:11:15,480 --> 00:11:17,440 or otherwise I have a typo on this slide. 203 00:11:18,790 --> 00:11:21,230 So the in, again, I, I love the in. 204 00:11:21,230 --> 00:11:22,120 I just absolutely 205 00:11:22,120 --> 00:11:24,700 love this in. I love this syntax. 206 00:11:24,700 --> 00:11:30,760 This for each letter in the word banana. Just, to me, it reads very smoothly. 207 00:11:30,760 --> 00:11:33,250 Cognitively, it fits in my mind what's going on. 208 00:11:33,250 --> 00:11:37,110 For each letter in banana, run this little indented block of text. 209 00:11:37,110 --> 00:11:42,990 Again, very pretty, I love in, it's one of my favorite little pieces of Python. 210 00:11:46,490 --> 00:11:49,430 So, again, with the for, it takes care of 211 00:11:49,430 --> 00:11:52,420 all the iteration variables for us, and it goes through the sequence. 212 00:11:52,420 --> 00:11:54,850 And so here's, here's an animation, right? 213 00:11:54,850 --> 00:11:57,910 Remember that the for is going to do all this work for us, right? 214 00:11:57,910 --> 00:12:00,710 Letter is going to advance through the 215 00:12:00,710 --> 00:12:04,720 successive values, the successive letters in banana. 216 00:12:04,720 --> 00:12:12,090 So letter is being moved for us by the for statement, okay? 217 00:12:12,090 --> 00:12:14,640 So that's looping through. 218 00:12:14,640 --> 00:12:16,661 Now it turns out there's a lot of common things that 219 00:12:16,661 --> 00:12:18,730 we want to do that are already built into Python for us. 220 00:12:20,100 --> 00:12:24,490 Clear the screen there. We call these slicing. 221 00:12:24,490 --> 00:12:28,870 So the index operator looks up various things in a string, but we 222 00:12:28,870 --> 00:12:33,470 can also pull substrings out, using the colon in addition to the square brackets. 223 00:12:33,470 --> 00:12:35,020 Again, this is called slicing. 224 00:12:36,350 --> 00:12:37,200 So the 225 00:12:37,200 --> 00:12:43,010 colon operator, basically, takes a starting position, and then an ending 226 00:12:43,010 --> 00:12:47,798 position, but the ending position is up to but not including the second one. 227 00:12:47,798 --> 00:12:51,660 So this is, it's up to but not including, up to but not including. 228 00:12:51,660 --> 00:12:54,410 Just like the zero, you get used to it pretty quick, 229 00:12:54,410 --> 00:12:56,020 but the first time you see it, it's a little bit 230 00:12:58,240 --> 00:12:59,220 wonky. 231 00:12:59,220 --> 00:13:03,480 So, if we're going 0 through 4, that's how I read this print, s sub 0 232 00:13:03,480 --> 00:13:08,960 through 4, or, or better, better said, s 0, up to but not including 4. 233 00:13:08,960 --> 00:13:14,160 That is, print me out the chunk that is up to, but not including, 4. 234 00:13:14,160 --> 00:13:18,510 So, it doesn't include 4, and so out comes Mont, right? 235 00:13:19,630 --> 00:13:23,325 So the next one is 6 up to but not including 7, so it starts at 6, 236 00:13:23,325 --> 00:13:30,010 up to but not including 7, so out comes the P. 237 00:13:30,010 --> 00:13:32,080 And, even though you might expect that it 238 00:13:32,080 --> 00:13:35,770 would traceback on this, Python is a little forgiving. 239 00:13:35,770 --> 00:13:37,310 So here's a moment where Python is a little 240 00:13:37,310 --> 00:13:40,170 forgiving, saying, you know, I'll give you a break here. 241 00:13:40,170 --> 00:13:42,630 If you go 6, but up to, but not including 20, 242 00:13:42,630 --> 00:13:45,510 I'll just stop at the end of the string. 243 00:13:45,510 --> 00:13:48,702 So it's 6 to the end, so it, it, you can over-reference here and 244 00:13:48,702 --> 00:13:51,530 you can not, you won't get yourself in trouble. 245 00:13:51,530 --> 00:13:53,280 So that comes out, Python. 246 00:13:53,280 --> 00:13:57,680 So, again, the second character is up to but not including, 247 00:13:57,680 --> 00:13:59,810 and that's the, kind of the weird thing there. 248 00:13:59,810 --> 00:14:01,540 Of course once you remember that the first character 249 00:14:01,540 --> 00:14:04,590 is 0, 0 up through but not including. Nice. 250 00:14:08,570 --> 00:14:12,380 If we leave off the first or the last number, leaving off the first number, the 251 00:14:12,380 --> 00:14:17,100 last number and both of them, they mean the beginning and end of the string, 252 00:14:17,100 --> 00:14:23,860 respectively. And so, up to but not including 2 is M-o. 253 00:14:23,860 --> 00:14:30,660 8 colon means starting at 8 to the end of the string. 254 00:14:30,660 --> 00:14:33,730 So that's, thon. And then, that means 255 00:14:33,730 --> 00:14:36,970 the beginning to the end, and so it's just the whole string, Monty Python. 256 00:14:38,110 --> 00:14:39,833 Now we've already played with string 257 00:14:39,833 --> 00:14:43,010 concatenation, just a thing to emphasize here is, 258 00:14:43,010 --> 00:14:48,740 the concatenation operator does not add a space, does not add a space. 259 00:14:48,740 --> 00:14:51,950 If you want a space, you explicitly add it. 260 00:14:51,950 --> 00:14:55,740 So here there's no space in between the o and the t, but here 261 00:14:55,740 --> 00:14:59,690 there is a space between the o and the t because we explicitly added it. 262 00:14:59,690 --> 00:15:02,280 So you can concatenate more than one thing. 263 00:15:02,280 --> 00:15:05,360 And you add your spaces as you want, okay? 264 00:15:08,000 --> 00:15:10,490 Another thing you can do is you can ask questions about a string. 265 00:15:10,490 --> 00:15:14,520 Sort of like doing a string lookup, using the in operator. 266 00:15:14,520 --> 00:15:17,790 This is a little different than how we use it inside of a for loop. 267 00:15:17,790 --> 00:15:20,690 This is a logical operation asking a question 268 00:15:20,690 --> 00:15:23,220 like less than or greater than or whatever. 269 00:15:23,220 --> 00:15:25,100 So, here's an expression. 270 00:15:25,100 --> 00:15:28,670 So fruit is banana, as always. Is n in fruit? 271 00:15:30,250 --> 00:15:33,020 And the answer is yes it is, True. So this 272 00:15:33,020 --> 00:15:35,050 is a logical operation. It's a question. 273 00:15:35,050 --> 00:15:36,620 It's a true or false. 274 00:15:36,620 --> 00:15:39,830 Is m in fruit? No, False. 275 00:15:39,830 --> 00:15:42,500 And you can, this can be a string, not just a single character. 276 00:15:42,500 --> 00:15:45,260 Is n-a-n in fruit? The answer is True. 277 00:15:45,260 --> 00:15:50,250 And you can put, sort of, you know, if, parts of ifs, et cetera, et cetera. 278 00:15:50,250 --> 00:15:53,500 So, this is a logical expression that can be on an if, 279 00:15:53,500 --> 00:15:57,100 you can have a while, et cetera, et cetera, et cetera. 280 00:15:57,100 --> 00:15:58,410 So it's a logical, 281 00:15:58,410 --> 00:16:00,670 logical expression and it returns True or False. 282 00:16:03,540 --> 00:16:05,560 You can also do comparisons. 283 00:16:05,560 --> 00:16:11,190 Now, the comparison operations, equals makes a lot of sense, less 284 00:16:11,190 --> 00:16:15,450 than and greater than depend on the language that you're using Python. 285 00:16:15,450 --> 00:16:20,204 And so, if you're using, like, a Latin character set, then alphabetical matters. 286 00:16:20,204 --> 00:16:22,480 You know, the, the way the Latin character set would do. 287 00:16:22,480 --> 00:16:24,380 But if you're in a different character set, Python is 288 00:16:24,380 --> 00:16:28,890 aware of multiple character sets and will sort strings based on 289 00:16:28,890 --> 00:16:32,050 the sorting algorithm of the particular character set. 290 00:16:33,160 --> 00:16:37,610 So you can do comparisons like equality, less than, and greater than. 291 00:16:37,610 --> 00:16:39,830 And we've seen some of these things in previous lectures, actually. 292 00:16:39,830 --> 00:16:40,650 We've had to use them. 293 00:16:42,080 --> 00:16:47,125 So in addition, to, sort of, these sort of fundamental operations that we 294 00:16:47,125 --> 00:16:54,263 can do on strings, there's a extensive library of built-in capabilities 295 00:16:54,263 --> 00:16:55,308 in Python. 296 00:16:55,308 --> 00:16:59,283 And so the, the way we see these built-in capabilities 297 00:16:59,283 --> 00:17:03,320 are they're, they're actually sort of built in to strings. 298 00:17:03,320 --> 00:17:05,760 So, let's go real slow here. 299 00:17:05,760 --> 00:17:07,310 Here we have a variable called greet and 300 00:17:07,310 --> 00:17:10,050 we're sticking the string Hello Bob into it. 301 00:17:10,050 --> 00:17:12,619 Now greet is of type string, as a result 302 00:17:12,619 --> 00:17:16,589 of this, and it contains Hello Bob as its value. 303 00:17:16,589 --> 00:17:18,296 But we can actually access 304 00:17:18,296 --> 00:17:26,559 capabilities inside of this value. So we can say, greet.lower(). 305 00:17:26,559 --> 00:17:30,650 This is calling something that's part of greet itself, it's a part of all strings. 306 00:17:30,650 --> 00:17:34,660 The fact that greet contains a string, means that we can ask for, 307 00:17:34,660 --> 00:17:38,120 hey, give me greet, which just gives you back what you're looking for. 308 00:17:38,120 --> 00:17:40,980 Like here, print greet is Hello Bob. 309 00:17:40,980 --> 00:17:45,500 Or you can say give me greet lower, so this is giving me a lowercase copy. 310 00:17:45,500 --> 00:17:51,030 It doesn't convert it to lowercase. It gives me a lowercase copy of Hello Bob. 311 00:17:51,030 --> 00:17:53,580 So zap is hello bob, all lowercase. 312 00:17:54,660 --> 00:17:59,950 Now, it didn't change greet, right? And, you can even put this .lower on the 313 00:17:59,950 --> 00:18:05,280 end of constants so, why you'd do this, I don't know, but Hi There, with H and T capitalized, 314 00:18:05,280 --> 00:18:10,640 .lower comes out as hi there. So this bit is part of 315 00:18:10,640 --> 00:18:11,560 all strings. 316 00:18:11,560 --> 00:18:17,900 Both variables and constants have these string functions built into them. 317 00:18:17,900 --> 00:18:21,120 And every instance of a string, whether it 318 00:18:21,120 --> 00:18:23,720 be a variable or a constant, has these capabilities. 319 00:18:23,720 --> 00:18:28,150 They don't modify it, they just give you back a copy. 320 00:18:28,150 --> 00:18:31,500 Now it turns out there is a, a 321 00:18:31,500 --> 00:18:36,170 command inside Python called dir, to ask questions like 322 00:18:36,170 --> 00:18:39,730 hey, well here's, you know, stuff has got Hello World. 323 00:18:39,730 --> 00:18:42,964 We can say. Redo this. 324 00:18:42,964 --> 00:18:45,560 Come here. 325 00:18:45,560 --> 00:18:48,240 Stuff is a string. We can ask, hey, what are you? 326 00:18:48,240 --> 00:18:49,660 I am a string. 327 00:18:49,660 --> 00:18:53,820 dir is another built-in Python that asks the question, hey, what are all 328 00:18:53,820 --> 00:18:56,640 the things that are built into this that I can make use of? 329 00:18:56,640 --> 00:18:57,780 And here they are. 330 00:18:57,780 --> 00:19:01,250 That's kind of a raw dump of them. You can also go look at 331 00:19:01,250 --> 00:19:05,910 the online documentation for Python and see at the Pyth, oop, at 332 00:19:05,910 --> 00:19:09,670 the Python website, you can see a whole bunch of these things. 333 00:19:09,670 --> 00:19:13,690 And they have the calling sequence, what the parameters are, et cetera. 334 00:19:13,690 --> 00:19:17,800 So when you're looking these things up, you can go, go read about them. 335 00:19:17,800 --> 00:19:19,140 Here's just a few that I've pulled out, 336 00:19:19,140 --> 00:19:23,200 capitalize, which uppercases the first characters, 337 00:19:23,200 --> 00:19:27,220 center, endswith, find, there's stripping. 338 00:19:27,220 --> 00:19:28,300 So I'll look through a couple of these, 339 00:19:28,300 --> 00:19:30,740 just the kind of things to be looking for. 340 00:19:30,740 --> 00:19:33,780 It'll be a good idea to take a look and read through some of the things. 341 00:19:33,780 --> 00:19:37,540 Here's a couple that, that we'll probably be using early on. 342 00:19:37,540 --> 00:19:43,700 The find function, it's similar to in but it tells you where it finds the, the 343 00:19:43,700 --> 00:19:49,517 particular thing that it's looking for. And and so we'll put fruit is banana. 344 00:19:49,517 --> 00:19:52,379 And I'm going to say pos, which is going to be an integer variable, 345 00:19:52,379 --> 00:19:54,002 equals fruit.find("na"). 346 00:19:54,002 --> 00:19:57,836 So what it's saying is, go look inside this variable fruit, 347 00:19:57,836 --> 00:20:01,551 hunt until you find the first occurrence of the string na. 348 00:20:01,551 --> 00:20:05,590 Hunt, hunt, hunt, hunt, whoop, got it. And then return it to me. 349 00:20:05,590 --> 00:20:10,580 So that's going to give me back 2. 2 is where it found it, right? 350 00:20:10,580 --> 00:20:14,120 So, where is it in the string, that's what find does. 351 00:20:14,120 --> 00:20:16,920 And if you don't find anything, like you're looking for z, 352 00:20:16,920 --> 00:20:21,440 no, no, no, I didn't find a z, then it gives me back negative 1. 353 00:20:21,440 --> 00:20:27,270 So just, again, this is just one of many built-in functions in string. 354 00:20:27,270 --> 00:20:30,130 The ability to find a substring, okay? 355 00:20:30,130 --> 00:20:33,090 Or find, yeah, find a character or string within another string. 356 00:20:35,330 --> 00:20:37,110 There's a lower case, there's also an 357 00:20:37,110 --> 00:20:40,710 upper case, This might be better named shouting. 358 00:20:40,710 --> 00:20:44,070 Upper means give me an uppercase copy of this variable. 359 00:20:44,070 --> 00:20:49,730 So Hello Bob becomes HELLO BOB, and then lower is hello bob, right? 360 00:20:49,730 --> 00:20:55,920 So these are both ways to get copies of uppercase and lowercase versions. 361 00:20:55,920 --> 00:20:58,438 You might think these are kind of silly, but one of the things 362 00:20:58,438 --> 00:21:01,450 that you tend to use lower for is if you're doing searching and 363 00:21:01,450 --> 00:21:03,700 you want to ignore case, you convert the whole thing 364 00:21:03,700 --> 00:21:06,382 to lowercase, and then you search for a lowercase string. 365 00:21:06,382 --> 00:21:08,712 So you, depends on if you want to ignore case or not. 366 00:21:08,712 --> 00:21:11,720 So that's, that's one of the reasons that you have things like this. 367 00:21:14,280 --> 00:21:19,224 There is a replace function. Again, it doesn't change the value. 368 00:21:19,224 --> 00:21:21,640 Greet is going to have Hello Bob. 369 00:21:21,640 --> 00:21:28,350 And I'm going to say, greet.replace all occurrences of Bob with Jane. 370 00:21:28,350 --> 00:21:32,660 That gives me back a copy, in nstr, says Hello Jane. 371 00:21:32,660 --> 00:21:35,690 So, so greet is unchanged. 372 00:21:35,690 --> 00:21:39,890 This replace says, make a copy and then make that following 373 00:21:39,890 --> 00:21:43,251 edit that you, that, that we've requested. 374 00:21:43,251 --> 00:21:46,447 [COUGH] Now we can also say, well, I mean, the replace 375 00:21:46,447 --> 00:21:50,490 is going to do all occurrences, so greet is still Hello Bob. 376 00:21:50,490 --> 00:21:51,660 This is kind of redundant here. 377 00:21:51,660 --> 00:21:53,980 I'm just doing it so you remember what it is. 378 00:21:53,980 --> 00:21:55,310 Greet is still Hello Bob. 379 00:21:55,310 --> 00:21:57,500 I put Hello Bob back in it and replace 380 00:21:57,500 --> 00:22:00,850 all the occurrences of lowercase o with uppercase X. 381 00:22:01,920 --> 00:22:05,096 And then that happens. So this says, 382 00:22:05,096 --> 00:22:11,927 go through the whole string [SOUND] doing all those replaces, okay? 383 00:22:11,927 --> 00:22:14,237 Now another common thing that we're going to have to do 384 00:22:14,237 --> 00:22:16,901 is we're going to have to throw away whitespace. 385 00:22:16,901 --> 00:22:18,628 Sometimes you have a string that 386 00:22:18,629 --> 00:22:21,893 has characters, blank characters, or other characters, 387 00:22:21,893 --> 00:22:26,328 at the beginning and the end, nonprintable characters, and we can strip them. 388 00:22:26,328 --> 00:22:30,458 And there's three charact, three functions that are built into 389 00:22:30,458 --> 00:22:32,840 to Python strings that do this for us. 390 00:22:33,920 --> 00:22:38,202 There is lstrip, which strips from the left. 391 00:22:38,202 --> 00:22:43,675 There is rstrip, which strips from the right. 392 00:22:43,675 --> 00:22:47,440 So it throws away these whitespaces, so, Hello Bob is gone. 393 00:22:48,470 --> 00:22:50,940 I mean, the, so it gets rid of these characters. 394 00:22:50,940 --> 00:22:53,373 Oops, these are the ones that are gotten rid of there. 395 00:22:53,373 --> 00:22:55,913 I need an eraser. And then 396 00:22:55,913 --> 00:22:59,313 let's use white, and then strip basically, gets rid of 397 00:22:59,313 --> 00:23:03,250 all the whitespace, both on the left and the right side. 398 00:23:03,250 --> 00:23:04,140 And gets rid of that. 399 00:23:04,140 --> 00:23:07,010 So we're going to, we're going to be using these a lot. 400 00:23:07,010 --> 00:23:09,860 It, one of the things you tend to do in Python is cleaning up data. 401 00:23:09,860 --> 00:23:11,790 Sometimes if you have spaces at the beginning or 402 00:23:11,790 --> 00:23:13,960 the end, you just want to kind of ignore them. 403 00:23:13,960 --> 00:23:15,790 So you strip them off, you throw them away. 404 00:23:18,020 --> 00:23:22,130 When we're looking for data, we sometimes are looking for a prefix, and 405 00:23:22,130 --> 00:23:27,400 there is a startswith function [COUGH] that gives you a true or a false. 406 00:23:27,400 --> 00:23:31,370 We're asking here, does this variable line start with the string Please. 407 00:23:31,370 --> 00:23:34,820 And the answer is True, because it does start with the string Please. 408 00:23:34,820 --> 00:23:38,290 Or, and then next, we ask, does this start with the letter p? 409 00:23:38,290 --> 00:23:41,060 And the answer is False, it does not start with the letter p. 410 00:23:42,070 --> 00:23:43,290 Okay? So there's 411 00:23:43,290 --> 00:23:44,880 lots more of these things. 412 00:23:48,480 --> 00:23:52,704 And reading data and tearing it apart is one of the things that we're going to 413 00:23:52,704 --> 00:23:57,296 really focus on for the rest of these first few chapters of the book, okay? 414 00:23:57,296 --> 00:24:00,041 Because that's one thing that Python's really good at is 415 00:24:00,041 --> 00:24:03,860 tearing data into pieces and pulling the pieces that you want. 416 00:24:03,860 --> 00:24:06,840 So, so let's take a look at this line. 417 00:24:06,840 --> 00:24:11,455 So this line that we've got here is a line from an actual email box. 418 00:24:11,455 --> 00:24:13,550 This is what, if you 419 00:24:13,550 --> 00:24:15,580 looked at your email, sort of, on your hard 420 00:24:15,580 --> 00:24:18,710 drive, email boxes would have this kind of a format. 421 00:24:18,710 --> 00:24:23,870 And there's actually many lines, and soon we'll reading whole files full of email. 422 00:24:23,870 --> 00:24:26,940 But for now, let's just say we've got this one line, somehow. 423 00:24:26,940 --> 00:24:29,400 And we're looking for, we don't know how long 424 00:24:29,400 --> 00:24:31,910 these things are going to be, the first charac, the 425 00:24:31,910 --> 00:24:34,520 first thing is from, then there's an email address, 426 00:24:34,520 --> 00:24:38,000 then there's some detail about when the mail was sent. 427 00:24:38,000 --> 00:24:40,550 But what we actually want is 428 00:24:40,550 --> 00:24:42,450 we want this part right here, 429 00:24:42,450 --> 00:24:45,910 and that's the domain name of the mail address, right? 430 00:24:45,910 --> 00:24:48,110 We want to extract this out. 431 00:24:48,110 --> 00:24:52,780 We're faced with this line, in a variable, and we want to extract that out. 432 00:24:52,780 --> 00:24:55,680 So this is kind of putting all these things together. 433 00:24:55,680 --> 00:24:59,330 So let's walk through how we do this. 434 00:24:59,330 --> 00:25:02,028 So, here's this line, and it's a big long string. 435 00:25:02,028 --> 00:25:03,950 Mostly we would've read this from a file, 436 00:25:03,950 --> 00:25:05,870 rather than just put it in a constant, but for now we 437 00:25:05,870 --> 00:25:08,480 put it in a constant, because we, files is the next chapter. 438 00:25:09,950 --> 00:25:12,500 And so what we're going to do is we're going to say, you 439 00:25:12,500 --> 00:25:15,380 know what, I'm going to look at this line and I'm going to go 440 00:25:15,380 --> 00:25:18,048 find the @ sign, and I want to know where the @ sign is. 441 00:25:18,048 --> 00:25:24,330 So I call data.find @ sign, and put the result in atpos. 442 00:25:24,330 --> 00:25:26,514 And that gives me 21. 443 00:25:26,514 --> 00:25:29,166 It hunts until it finds the @ sign, and 444 00:25:29,166 --> 00:25:34,310 then tells me where I found it. Then what I want to look at is, starting 445 00:25:34,310 --> 00:25:39,200 here, for the rest of the string, I want to find the first space afterwards. 446 00:25:40,250 --> 00:25:45,868 So what I say is, this, sppos is my variable for the position of the space, 447 00:25:45,868 --> 00:25:51,132 data.find, a blank, starting at the @. 448 00:25:51,132 --> 00:25:54,216 So this is starting at 21. So it says, I'll start 449 00:25:54,216 --> 00:25:59,523 at 21 and I'll look for the next blank. And I find that at 31. 450 00:25:59,523 --> 00:26:05,350 So now I know where the @ sign is and I know where the space is. 451 00:26:05,350 --> 00:26:08,172 And so what I'm looking at is, I want the stuff 452 00:26:08,172 --> 00:26:14,186 one beyond the @ sign, up to but not including the space. 453 00:26:14,186 --> 00:26:20,142 So then I can use a slicing operation, I can use a slicing operation. 454 00:26:20,142 --> 00:26:22,650 Start at the @ position, add 1 to it, 455 00:26:22,650 --> 00:26:26,480 so advance 1, that's going to be the letter u. 456 00:26:26,480 --> 00:26:30,730 And then a slicing operation, up to but not including space. 457 00:26:30,730 --> 00:26:36,190 Up to, this is going to work out nicely all of a sudden, but not 458 00:26:36,190 --> 00:26:41,770 including, okay? And then 459 00:26:41,770 --> 00:26:45,796 I'm going to take that slice, which is really this little bit of data right here, 460 00:26:45,796 --> 00:26:49,500 take that slice, and put in the variable host. 461 00:26:49,500 --> 00:26:53,844 Then we print that out and we get the piece, okay? 462 00:26:53,844 --> 00:26:56,980 And so, here we have some data we want to tear apart. 463 00:26:56,980 --> 00:26:58,230 We hunt for the @. 464 00:26:58,230 --> 00:27:00,281 We find it at position 21. 465 00:27:00,281 --> 00:27:04,598 We start at 21 and we look for the, the space after that. 466 00:27:04,598 --> 00:27:10,659 31, and then we pull from 22, up to but not including, 31. 467 00:27:10,659 --> 00:27:13,380 And it, it wouldn't matter where this thing was, because these aren't all 468 00:27:13,380 --> 00:27:17,491 the same length when we start looking at them in files, but it 469 00:27:17,491 --> 00:27:20,541 would have found the @ sign and the space after the @ sign, 470 00:27:20,541 --> 00:27:24,258 and it would have reliably pulled out the host, okay? 471 00:27:24,258 --> 00:27:29,646 So this is a basic pattern we call parsing. 472 00:27:29,646 --> 00:27:32,068 Parsing text. 473 00:27:32,068 --> 00:27:35,620 Find this, find that other thing, grab this thing out, 474 00:27:35,620 --> 00:27:40,040 then look inside that thing and [SOUND]. So it does all these things, right? 475 00:27:40,040 --> 00:27:45,430 So, that's kind of like strings. Up next, we have files. 476 00:27:45,430 --> 00:27:46,770 Files are going to be lots of strings. 477 00:27:46,770 --> 00:27:49,320 So we're going to start putting all these things together. 478 00:27:49,320 --> 00:27:52,490 And and so the next chapter is a really, really 479 00:27:52,490 --> 00:27:55,600 important chapter, where it starts to really start coming together. 480 00:27:55,600 --> 00:27:57,110 So see you soon.