0:00:00.200,0:00:02.820 Hello, and welcome to Chapter Six. 0:00:02.820,0:00:05.240 This chapter we're going to[br]talk about strings, and 0:00:05.240,0:00:08.610 stuff is going to start to get real now. 0:00:08.610,0:00:12.610 So, as always, this material, this video,[br]these 0:00:12.610,0:00:15.550 slides and book are copyright Creative[br]Commons Attribution. 0:00:15.550,0:00:16.870 I want you to use these materials. 0:00:16.870,0:00:18.800 I want you to, somebody else, I want to 0:00:18.800,0:00:21.720 make more teachers, so everyone can teach[br]this stuff. 0:00:21.720,0:00:22.790 Use it however you like. 0:00:24.010,0:00:25.280 Okay, so we've been playing with 0:00:25.280,0:00:26.730 strings from the beginning. 0:00:26.730,0:00:28.320 I mean, literally, if we didn't work 0:00:28.320,0:00:31.040 with strings, we could've never printed[br]Hello World. 0:00:31.040,0:00:35.813 And, and lord knows, we need to print[br]Hello World in a programming language. 0:00:35.813,0:00:39.610 And so, we've been using them, especially[br]constants. 0:00:39.610,0:00:41.650 Now, in this chapter, we're going to dig in. 0:00:41.650,0:00:46.986 So, oops, so a string is a sequence of[br]characters. 0:00:46.986,0:00:50.408 You can use either use single quotes or[br]double quotes in Python 0:00:50.408,0:00:51.360 to delimit a string. 0:00:51.360,0:00:54.528 And so here's two string constants, Hello[br]and there, 0:00:54.528,0:00:58.140 and stuck into the variables str1[br]and str2. 0:00:58.140,0:01:00.520 We can concatenate them together[br]with a plus sign. 0:01:00.520,0:01:03.100 Python is smart enough to look and say, 0:01:03.100,0:01:05.694 oh, those are strings, I know what to[br]do with those. 0:01:05.694,0:01:09.588 And you'll notice that the plus doesn't[br]add any space here, because when 0:01:09.588,0:01:13.566 we print bob out, Hello and there are right[br]next to one another. 0:01:13.566,0:01:17.014 If, for example, we've done some[br]conversions, 0:01:17.014,0:01:18.799 so when we were, like, reading pay, 0:01:18.799,0:01:20.640 and rate, and hours, and stuff,[br]we've done some conversions. 0:01:20.640,0:01:23.378 So this is an example of the,[br]a string 1 2 3 0:01:23.378,0:01:27.211 Not 123, but the string, quote 1 2 3[br]quote. 0:01:27.211,0:01:29.270 And we can't add 1 to this, we get 0:01:29.270,0:01:32.910 a traceback, kind of, at this point, as we[br]expected. 0:01:32.910,0:01:37.020 And we would convert that to an integer[br]using the int function that's built in. 0:01:37.020,0:01:39.900 See how much Python you already know?[br]I mean, this is awesome, right? 0:01:39.900,0:01:40.970 I can just say, 0:01:40.970,0:01:42.800 oh, you call the int function,[br]and you know what that is. 0:01:42.800,0:01:46.220 That's, sorry, sorry, I'm just[br]awesomed out. 0:01:46.220,0:01:50.800 So you convert this to an integer, and[br]then you add 1 to it, and then we get 124. 0:01:50.800,0:01:52.270 So, there you go. 0:01:52.270,0:01:54.740 We've been doing strings all along, had to. 0:01:54.740,0:01:57.000 I mean, literally, strings and numeric data 0:01:57.000,0:01:59.930 are the two things that programs deal with. 0:01:59.930,0:02:03.120 So, we've been reading and converting. 0:02:03.120,0:02:05.175 Again, this is sort of a pattern from some[br]of the earlier programs 0:02:05.175,0:02:08.661 where we do a raw input, you know? 0:02:08.661,0:02:10.887 And the raw input just takes a string and[br]puts it in a variable. 0:02:10.887,0:02:14.560 So if I take Chuck, then the[br]variable contains the string C-h-u-c-k. 0:02:15.990,0:02:18.970 Even if we type numbers, that is a string. 0:02:18.970,0:02:23.660 We can't, just because I put 1 0 0 in,[br]I still can't subtract 10. 0:02:23.660,0:02:28.270 We get a happy little traceback, oh, happy[br]little, sad-faced traceback. 0:02:28.270,0:02:31.294 And, and, but of course, if we convert it 0:02:31.294,0:02:34.050 into float or something like that. 0:02:35.190,0:02:38.670 We convert int or float, we can do that[br]and subtract 10, and we can do that. 0:02:38.670,0:02:41.680 So, so we've been doing this for a while. 0:02:41.680,0:02:45.130 We've been doing strings and manipulating[br]strings and converting strings all along. 0:02:45.130,0:02:49.051 So the thing we're going to start doing[br]now is we're going to dive into strings. 0:02:49.051,0:02:53.098 We realize that strings are addressable at[br]a character-by-character basis, 0:02:53.098,0:02:56.350 and we can do all kind of cool[br]things with that. 0:02:56.350,0:02:59.998 And so, a string is a sequence of[br]characters, and we 0:02:59.998,0:03:04.450 can look inside them using what we call[br]the index operator, 0:03:04.450,0:03:06.720 the square brackets. And we've seen[br]square brackets in 0:03:06.720,0:03:08.230 lists, and you'll see that there's sort of 0:03:08.230,0:03:11.610 similarities between lists of numbers,[br]and, in effect, a 0:03:11.610,0:03:14.350 string is a special kind of list of[br]characters. 0:03:14.350,0:03:17.197 So if we take this string banana, 0:03:17.197,0:03:21.242 the string banana starts, the first[br]character starts at 0. 0:03:21.242,0:03:24.891 So, we call this operator sub, so [br]letter equals 0:03:24.891,0:03:28.383 fruit sub 1 and that is the second[br]character. 0:03:28.383,0:03:30.603 Now this may seem a little weird that the[br]first character 0:03:30.603,0:03:33.956 is a 0 and the second character is a 1. 0:03:33.956,0:03:38.500 It actually is kind of similar to the old[br]elevator thing, where in Europe they're 0:03:38.500,0:03:41.124 called, the first floor is zero, then[br]negative one, 0:03:41.124,0:03:43.558 and the second floor is one, right? 0:03:43.558,0:03:46.093 It's kind of the same thing.[br]Actually, it turns out that 0:03:46.093,0:03:50.456 internally zero was a better way[br]to start than one. 0:03:50.456,0:03:54.156 It, you'll get used to it and then after[br]a while there's 0:03:54.156,0:03:58.540 some little cool advantages to it, but for[br]now, beginning is zero. 0:03:58.540,0:04:01.939 Just, beginning is zero, it is the rule,[br]just remember it. 0:04:02.970,0:04:08.790 Okay, so the 0 is b, the 1 is a, the 2 is[br]n, et cetera, et cetera. 0:04:08.790,0:04:11.160 And we call this syntax 0:04:11.160,0:04:12.540 fruit sub 1, okay? 0:04:12.540,0:04:17.123 So that is the sub 1 character of fruit,[br]and then that is an a. 0:04:17.123,0:04:21.250 So that fruit sub 1 says, look up in[br]banana, find the 1 position, 0:04:21.250,0:04:25.870 and give me what's in that 1[br]position, that's what's the sub. 0:04:25.870,0:04:29.570 And what's inside these brackets can be[br]an expression. 0:04:29.570,0:04:33.690 So if we set n to 3, n minus 1, well[br]that'll compute to 2. 0:04:33.690,0:04:36.660 And then fruit sub 2 is the letter n, 0:04:36.660,0:04:39.979 right? So that's fruit sub 2, okay? 0:04:39.979,0:04:42.320 It's the third character, fruit sub 2. 0:04:42.320,0:04:47.336 So the index starts at 0, the, we read the[br]brackets as sub, fruit sub 1, 0:04:47.336,0:04:52.750 fruit sub 2. Now, Python will 0:04:52.750,0:04:57.860 complain to you if you use this sub[br]operator too far down the string. 0:04:57.860,0:05:01.316 Here is a character with 3, which[br]is 0, 1, and 2. 0:05:01.316,0:05:05.420 And if we go to sub 5, it blows up. 0:05:05.420,0:05:10.260 Now, you know, by now I hope that you're[br]not freaking out about traceback errors. 0:05:10.260,0:05:14.070 Remember, traceback errors are just Python[br]trying to inform you. 0:05:14.070,0:05:18.930 And if we just stop looking at that as[br]mean Python face, and 0:05:18.930,0:05:24.190 instead look at that as, oh, index error,[br]string index out of range. 0:05:24.190,0:05:27.360 Oh yeah, I stuck a five in there and[br]there's only three, oh, 0:05:27.360,0:05:31.330 my bad, thank you, Python, appreciate it,[br]thanks for the help. 0:05:31.330,0:05:34.870 So, think of this as like, it's not a[br]smiley face 0:05:34.870,0:05:38.690 but it's kind of like a, a quizzical face,[br]right, it's like [SOUND]. 0:05:38.690,0:05:39.660 I don't know. 0:05:39.660,0:05:42.950 Python's confused and it's trying to tell[br]you what it's confused, okay? 0:05:42.950,0:05:46.780 So don't look at these as sad faces.[br]Python doesn't hate you, Python loves you. 0:05:48.170,0:05:52.420 And loves me too.[br]So, strings have individual 0:05:52.420,0:05:54.420 characters that we can address with the[br]index operator. 0:05:54.420,0:05:56.160 They also have length. 0:05:56.160,0:06:00.400 And there is a built-in function called[br]len, that we can call and pass in 0:06:00.400,0:06:03.980 as a parameter the variable or a[br]constant, 0:06:03.980,0:06:05.940 and it will tell us how many characters. 0:06:05.940,0:06:10.040 Now this banana has six characters in it[br]that are 0 through 5. 0:06:10.040,0:06:12.524 So don't get a little confused, the last[br]character is 0:06:12.524,0:06:15.750 the fifth, is sub 5, but it's also the[br]sixth character. 0:06:15.750,0:06:17.450 So the length is really the length, it's 0:06:17.450,0:06:22.150 not length minus 1, okay?[br]So len is like a built-in function. 0:06:22.150,0:06:23.840 It's not a function we have to write, 0:06:23.840,0:06:26.570 as we talked in chapter the functions[br]chapter. 0:06:26.570,0:06:28.626 There are things that are part of Python[br]that are just sitting there. 0:06:28.626,0:06:31.172 And so we are passing banana, the[br]variable 0:06:31.172,0:06:35.010 fruit, into function, we're passing it[br]into function. 0:06:35.010,0:06:36.590 And then, into the len function. 0:06:36.590,0:06:42.250 Then len [SOUND] does magic stuff.[br]And then out comes the answer. 0:06:42.250,0:06:48.320 And that 6 replaces this and then the 6 goes[br]into the variable x, and so x is 6. 0:06:48.320,0:06:51.070 I sure made that a messy looking slide. 0:06:51.070,0:06:55.080 And so, think of inside the len function,[br]there's a def. 0:06:55.080,0:06:59.890 len takes a parameter, does some loopy[br]things, and it does its thing. 0:06:59.890,0:07:02.350 So, it's a function that we might write[br]except we don't 0:07:02.350,0:07:07.160 have to because it's already written and[br]built in to Python. 0:07:07.160,0:07:10.380 Okay. So that's the length of the 0:07:10.380,0:07:12.460 string, that's getting it individual[br]characters. 0:07:12.460,0:07:15.550 We can also loop through strings. 0:07:15.550,0:07:18.710 Obviously, if we can use the index[br]operator, and we 0:07:18.710,0:07:21.970 can put a variable in there, we can[br]write a loop. 0:07:21.970,0:07:23.520 This is an indefinite loop. 0:07:23.520,0:07:27.140 So we have this variable fruit, has the[br]string banana in it. 0:07:27.140,0:07:29.580 We set the variable index to 0. 0:07:29.580,0:07:32.920 We make a little while loop.[br]And we ask, 0:07:32.920,0:07:35.460 as long as index is less than the length[br]of fruit. 0:07:35.460,0:07:37.510 Now remember, the length of fruit is[br]going to be 6. 0:07:37.510,0:07:39.520 But we don't want to make that less than[br]or equal to 0:07:39.520,0:07:43.630 because then we would crash, because[br]the last character is 5. 0:07:43.630,0:07:46.438 We can say letter is equal to fruit sub[br]index, so that's going to 0:07:46.438,0:07:50.040 start out being index of, is going to be[br]0, so that's fruit sub 0. 0:07:50.040,0:07:53.300 Then we print index and letter, so that[br]means the 0:07:53.300,0:07:56.220 first time through the loop we're[br]going to see 0 b. 0:07:56.220,0:07:58.056 Then we increment our 0:07:58.056,0:08:04.450 iteration operator, and go up.[br]And then we come out with 1 a. 0:08:04.450,0:08:13.560 And index advances until index is 6, but[br]has printed out each of the letters. 0:08:13.560,0:08:15.790 Now, we're not doing this just to 0:08:15.790,0:08:18.620 print them out, we will do something[br]a little more valuable, 0:08:21.540,0:08:23.150 valuable inside that loop. 0:08:23.150,0:08:28.740 But this gives the sense that we can work[br]through a loop just like we, we, 0:08:28.740,0:08:35.779 we can work through a string just like[br]we work through a list of numbers, okay? 0:08:35.779,0:08:38.630 Now, that was how you do it with an[br]indefinite loop. 0:08:38.630,0:08:42.870 In a definite loop, it's just far more[br]awesome, okay? 0:08:42.870,0:08:44.880 Just like we did with the list of numbers, 0:08:46.110,0:08:49.320 Python understands strings and allows us[br]to write 0:08:49.320,0:08:53.410 for loops, using for and in, that go through[br]the strings. 0:08:53.410,0:08:56.910 So basically, for letter in fruit, now[br]remember, I'm using letter as a 0:08:56.910,0:09:01.220 mnemonic variable here, it's just a[br]choice, a wise choice of a variable name. 0:09:01.220,0:09:05.685 So that says, run this little block of[br]text once for 0:09:05.685,0:09:08.195 each letter in the variable fruit, which[br]means that letter's going to 0:09:08.195,0:09:13.959 take on the successive b-a-n-a-n-a. 0:09:13.959,0:09:16.084 When I look at that I always worry that I[br]misspelled it. 0:09:16.084,0:09:18.925 I think I got these right. 0:09:18.925,0:09:22.423 If I rewrite this book, I'm not going to[br]use banana as the example because I'm 0:09:22.423,0:09:24.649 terrified that I misspelled banana,[br]because I don't 0:09:24.649,0:09:27.190 know how many n's banana has in it. 0:09:27.190,0:09:32.280 But, be that as it may, we are[br]abstracting, we are letting Python say, 0:09:32.280,0:09:36.300 run this little block of text once, in[br]order, for each of the letters in 0:09:36.300,0:09:40.990 the variable fruit, which is b-a-n-a, and[br]so it prints out each of the letters. 0:09:40.990,0:09:46.110 So this is a much prettier version of the,[br]the looping, 0:09:46.110,0:09:50.690 so using the definite, the for keyword[br]instead of the while keyword. 0:09:50.690,0:09:54.060 And so, we can just kind of compare these[br]two things. 0:09:54.060,0:09:55.570 They kind of do the exact same thing. 0:09:55.570,0:09:57.680 And it also is kind of a, gives you a 0:09:57.680,0:10:01.120 sense of what the for is doing for us,[br]right? 0:10:01.120,0:10:01.530 The for is 0:10:01.530,0:10:05.100 setting up this index, the for is[br]looking up 0:10:05.100,0:10:07.890 inside of fruit, and the for is advancing[br]the index. 0:10:07.890,0:10:10.220 So the for's doing a bunch of work for us 0:10:10.220,0:10:12.390 and I've characterized that, sort of, in[br]the previous lecture. 0:10:12.390,0:10:14.890 How the for is sort of doing a bunch of[br]things for us 0:10:14.890,0:10:19.508 and that's, it allows our code to[br]be more 0:10:19.508,0:10:22.500 expressive and, and instead of, so this[br]is, a lot of 0:10:22.500,0:10:26.500 this is just kind of bookkeeping crap that[br]we don't really need. 0:10:26.500,0:10:29.580 And so the for loop helps us by doing some[br]of the bookkeeping for us. 0:10:31.920,0:10:34.960 Okay, so we can do all those loops again. 0:10:34.960,0:10:38.761 We can find the largest letter, the[br]smallest letter, the, how many times. 0:10:38.761,0:10:45.390 So, I think, what, how many n's are in[br]this, or how many a's are in this. 0:10:45.390,0:10:49.690 So this is a simple counting pattern and,[br]and a looking pattern. 0:10:49.690,0:10:52.720 And so, our word is banana, our count is 0. 0:10:52.720,0:10:54.976 For the letter in word, again, boop, boop, 0:10:54.976,0:10:56.940 boop, boop, boop, that comes out like that. 0:10:56.940,0:11:01.320 So it's going to run this little block.[br]If the letter is a, add 1 to the count. 0:11:02.330,0:11:07.580 So this is going to basically print out at[br]the end the number of a's in banana. 0:11:07.580,0:11:10.360 It would probably be more useful, for me,[br]to print out the number 0:11:10.360,0:11:13.910 of n's in banana, because I never know how[br]many n's are in banana. 0:11:13.910,0:11:15.480 But it looks like there's supposed to be two, 0:11:15.480,0:11:17.440 or otherwise I have a typo on this slide. 0:11:18.790,0:11:21.230 So the in, again, I, I love the in. 0:11:21.230,0:11:22.120 I just absolutely 0:11:22.120,0:11:24.700 love this in.[br]I love this syntax. 0:11:24.700,0:11:30.760 This for each letter in the word banana.[br]Just, to me, it reads very smoothly. 0:11:30.760,0:11:33.250 Cognitively, it fits in my mind what's[br]going on. 0:11:33.250,0:11:37.110 For each letter in banana, run this little[br]indented block of text. 0:11:37.110,0:11:42.990 Again, very pretty, I love in, it's one of[br]my favorite little pieces of Python. 0:11:46.490,0:11:49.430 So, again, with the for, it takes care of 0:11:49.430,0:11:52.420 all the iteration variables for us, and it[br]goes through the sequence. 0:11:52.420,0:11:54.850 And so here's, here's an animation, right? 0:11:54.850,0:11:57.910 Remember that the for is going to do all[br]this work for us, right? 0:11:57.910,0:12:00.710 Letter is going to advance through the 0:12:00.710,0:12:04.720 successive values, the successive letters[br]in banana. 0:12:04.720,0:12:12.090 So letter is being moved for us by the for[br]statement, okay? 0:12:12.090,0:12:14.640 So that's looping through. 0:12:14.640,0:12:16.661 Now it turns out there's a lot of[br]common things that 0:12:16.661,0:12:18.730 we want to do that are already built into[br]Python for us. 0:12:20.100,0:12:24.490 Clear the screen there.[br]We call these slicing. 0:12:24.490,0:12:28.870 So the index operator looks up various[br]things in a string, but we 0:12:28.870,0:12:33.470 can also pull substrings out, using the[br]colon in addition to the square brackets. 0:12:33.470,0:12:35.020 Again, this is called slicing. 0:12:36.350,0:12:37.200 So the 0:12:37.200,0:12:43.010 colon operator, basically, takes a[br]starting position, and then an ending 0:12:43.010,0:12:47.798 position, but the ending position is up to[br]but not including the second one. 0:12:47.798,0:12:51.660 So this is, it's up to but not including,[br]up to but not including. 0:12:51.660,0:12:54.410 Just like the zero, you get used to it[br]pretty quick, 0:12:54.410,0:12:56.020 but the first time you see it, it's a[br]little bit 0:12:58.240,0:12:59.220 wonky. 0:12:59.220,0:13:03.480 So, if we're going 0 through 4, that's how[br]I read this print, s sub 0 0:13:03.480,0:13:08.960 through 4, or, or better, better said,[br]s 0, up to but not including 4. 0:13:08.960,0:13:14.160 That is, print me out the chunk that is up[br]to, but not including, 4. 0:13:14.160,0:13:18.510 So, it doesn't include 4, and so out comes[br]Mont, right? 0:13:19.630,0:13:23.325 So the next one is 6 up to but not[br]including 7, so it starts at 6, 0:13:23.325,0:13:30.010 up to but not including 7, so[br]out comes the P. 0:13:30.010,0:13:32.080 And, even though you might expect that it 0:13:32.080,0:13:35.770 would traceback on this, Python is a[br]little forgiving. 0:13:35.770,0:13:37.310 So here's a moment where Python is a[br]little 0:13:37.310,0:13:40.170 forgiving, saying, you know, I'll give you[br]a break here. 0:13:40.170,0:13:42.630 If you go 6, but up to, but not including 20, 0:13:42.630,0:13:45.510 I'll just stop at the end of the string. 0:13:45.510,0:13:48.702 So it's 6 to the end, so it, it, you can[br]over-reference here and 0:13:48.702,0:13:51.530 you can not, you won't get yourself in[br]trouble. 0:13:51.530,0:13:53.280 So that comes out, Python. 0:13:53.280,0:13:57.680 So, again, the second character is[br]up to but not including, 0:13:57.680,0:13:59.810 and that's the, kind of the[br]weird thing there. 0:13:59.810,0:14:01.540 Of course once you remember that[br]the first character 0:14:01.540,0:14:04.590 is 0, 0 up through but not including.[br]Nice. 0:14:08.570,0:14:12.380 If we leave off the first or the last[br]number, leaving off the first number, the 0:14:12.380,0:14:17.100 last number and both of them, they mean[br]the beginning and end of the string, 0:14:17.100,0:14:23.860 respectively.[br]And so, up to but not including 2 is M-o. 0:14:23.860,0:14:30.660 8 colon means starting at 8 to the end of[br]the string. 0:14:30.660,0:14:33.730 So that's, thon.[br]And then, that means 0:14:33.730,0:14:36.970 the beginning to the end, and so it's[br]just the whole string, Monty Python. 0:14:38.110,0:14:39.833 Now we've already played with string 0:14:39.833,0:14:43.010 concatenation, just a thing to[br]emphasize here is, 0:14:43.010,0:14:48.740 the concatenation operator does not[br]add a space, does not add a space. 0:14:48.740,0:14:51.950 If you want a space, you explicitly add it. 0:14:51.950,0:14:55.740 So here there's no space in between the o[br]and the t, but here 0:14:55.740,0:14:59.690 there is a space between the o and the t[br]because we explicitly added it. 0:14:59.690,0:15:02.280 So you can concatenate more than one[br]thing. 0:15:02.280,0:15:05.360 And you add your spaces as you want,[br]okay? 0:15:08.000,0:15:10.490 Another thing you can do is you can ask[br]questions about a string. 0:15:10.490,0:15:14.520 Sort of like doing a string lookup, using[br]the in operator. 0:15:14.520,0:15:17.790 This is a little different than how we use[br]it inside of a for loop. 0:15:17.790,0:15:20.690 This is a logical operation asking a[br]question 0:15:20.690,0:15:23.220 like less than or greater than or[br]whatever. 0:15:23.220,0:15:25.100 So, here's an expression. 0:15:25.100,0:15:28.670 So fruit is banana, as always.[br]Is n in fruit? 0:15:30.250,0:15:33.020 And the answer is yes it is, True.[br]So this 0:15:33.020,0:15:35.050 is a logical operation.[br]It's a question. 0:15:35.050,0:15:36.620 It's a true or false. 0:15:36.620,0:15:39.830 Is m in fruit?[br]No, False. 0:15:39.830,0:15:42.500 And you can, this can be a string, not[br]just a single character. 0:15:42.500,0:15:45.260 Is n-a-n in fruit?[br]The answer is True. 0:15:45.260,0:15:50.250 And you can put, sort of, you know, if,[br]parts of ifs, et cetera, et cetera. 0:15:50.250,0:15:53.500 So, this is a logical expression that can[br]be on an if, 0:15:53.500,0:15:57.100 you can have a while, et cetera, et[br]cetera, et cetera. 0:15:57.100,0:15:58.410 So it's a logical, 0:15:58.410,0:16:00.670 logical expression and it returns[br]True or False. 0:16:03.540,0:16:05.560 You can also do comparisons. 0:16:05.560,0:16:11.190 Now, the comparison operations, equals[br]makes a lot of sense, less 0:16:11.190,0:16:15.450 than and greater than depend on the[br]language that you're using Python. 0:16:15.450,0:16:20.204 And so, if you're using, like, a Latin[br]character set, then alphabetical matters. 0:16:20.204,0:16:22.480 You know, the, the way the Latin character[br]set would do. 0:16:22.480,0:16:24.380 But if you're in a different character[br]set, Python is 0:16:24.380,0:16:28.890 aware of multiple character sets and will[br]sort strings based on 0:16:28.890,0:16:32.050 the sorting algorithm of the particular[br]character set. 0:16:33.160,0:16:37.610 So you can do comparisons like equality,[br]less than, and greater than. 0:16:37.610,0:16:39.830 And we've seen some of these things in[br]previous lectures, actually. 0:16:39.830,0:16:40.650 We've had to use them. 0:16:42.080,0:16:47.125 So in addition, to, sort of, these sort of[br]fundamental operations that we 0:16:47.125,0:16:54.263 can do on strings, there's a extensive[br]library of built-in capabilities 0:16:54.263,0:16:55.308 in Python. 0:16:55.308,0:16:59.283 And so the, the way we see these built-in[br]capabilities 0:16:59.283,0:17:03.320 are they're, they're actually sort of[br]built in to strings. 0:17:03.320,0:17:05.760 So, let's go real slow here. 0:17:05.760,0:17:07.310 Here we have a variable called greet and 0:17:07.310,0:17:10.050 we're sticking the string Hello Bob[br]into it. 0:17:10.050,0:17:12.619 Now greet is of type string, as a result 0:17:12.619,0:17:16.589 of this, and it contains Hello Bob as its[br]value. 0:17:16.589,0:17:18.296 But we can actually access 0:17:18.296,0:17:26.559 capabilities inside of this value. So we[br]can say, greet.lower(). 0:17:26.559,0:17:30.650 This is calling something that's part of[br]greet itself, it's a part of all strings. 0:17:30.650,0:17:34.660 The fact that greet contains a string,[br]means that we can ask for, 0:17:34.660,0:17:38.120 hey, give me greet, which just gives you[br]back what you're looking for. 0:17:38.120,0:17:40.980 Like here, print greet is Hello Bob. 0:17:40.980,0:17:45.500 Or you can say give me greet lower, so[br]this is giving me a lowercase copy. 0:17:45.500,0:17:51.030 It doesn't convert it to lowercase.[br]It gives me a lowercase copy of Hello Bob. 0:17:51.030,0:17:53.580 So zap is hello bob, all lowercase. 0:17:54.660,0:17:59.950 Now, it didn't change greet, right?[br]And, you can even put this .lower on the 0:17:59.950,0:18:05.280 end of constants so, why you'd do this, I don't[br]know, but Hi There, with H and T capitalized, 0:18:05.280,0:18:10.640 .lower comes out as hi there.[br]So this bit is part of 0:18:10.640,0:18:11.560 all strings. 0:18:11.560,0:18:17.900 Both variables and constants have these[br]string functions built into them. 0:18:17.900,0:18:21.120 And every instance of a string, whether it 0:18:21.120,0:18:23.720 be a variable or a constant, has these[br]capabilities. 0:18:23.720,0:18:28.150 They don't modify it, they just give you[br]back a copy. 0:18:28.150,0:18:31.500 Now it turns out there is a, a 0:18:31.500,0:18:36.170 command inside Python called dir, to ask[br]questions like 0:18:36.170,0:18:39.730 hey, well here's, you know, stuff[br]has got Hello World. 0:18:39.730,0:18:42.964 We can say. Redo this. 0:18:42.964,0:18:45.560 Come here. 0:18:45.560,0:18:48.240 Stuff is a string.[br]We can ask, hey, what are you? 0:18:48.240,0:18:49.660 I am a string. 0:18:49.660,0:18:53.820 dir is another built-in Python that asks[br]the question, hey, what are all 0:18:53.820,0:18:56.640 the things that are built into this that I[br]can make use of? 0:18:56.640,0:18:57.780 And here they are. 0:18:57.780,0:19:01.250 That's kind of a raw dump of them.[br]You can also go look at 0:19:01.250,0:19:05.910 the online documentation for Python and[br]see at the Pyth, oop, at 0:19:05.910,0:19:09.670 the Python website, you can see a whole[br]bunch of these things. 0:19:09.670,0:19:13.690 And they have the calling sequence, what[br]the parameters are, et cetera. 0:19:13.690,0:19:17.800 So when you're looking these things up,[br]you can go, go read about them. 0:19:17.800,0:19:19.140 Here's just a few that I've pulled out, 0:19:19.140,0:19:23.200 capitalize, which uppercases the[br]first characters, 0:19:23.200,0:19:27.220 center, endswith, find, there's stripping. 0:19:27.220,0:19:28.300 So I'll look through a couple of these, 0:19:28.300,0:19:30.740 just the kind of things to be looking for. 0:19:30.740,0:19:33.780 It'll be a good idea to take a look and read[br]through some of the things. 0:19:33.780,0:19:37.540 Here's a couple that, that we'll probably[br]be using early on. 0:19:37.540,0:19:43.700 The find function, it's similar to in but[br]it tells you where it finds the, the 0:19:43.700,0:19:49.517 particular thing that it's looking for.[br]And and so we'll put fruit is banana. 0:19:49.517,0:19:52.379 And I'm going to say pos, which is[br]going to be an integer variable, 0:19:52.379,0:19:54.002 equals fruit.find("na"). 0:19:54.002,0:19:57.836 So what it's saying is, go look inside[br]this variable fruit, 0:19:57.836,0:20:01.551 hunt until you find the first occurrence[br]of the string na. 0:20:01.551,0:20:05.590 Hunt, hunt, hunt, hunt, whoop, got it.[br]And then return it to me. 0:20:05.590,0:20:10.580 So that's going to give me back 2.[br]2 is where it found it, right? 0:20:10.580,0:20:14.120 So, where is it in the string, that's what[br]find does. 0:20:14.120,0:20:16.920 And if you don't find anything, like[br]you're looking for z, 0:20:16.920,0:20:21.440 no, no, no, I didn't find a z, then it[br]gives me back negative 1. 0:20:21.440,0:20:27.270 So just, again, this is just one of many[br]built-in functions in string. 0:20:27.270,0:20:30.130 The ability to find a substring, okay? 0:20:30.130,0:20:33.090 Or find, yeah, find a character or string[br]within another string. 0:20:35.330,0:20:37.110 There's a lower case, there's also an 0:20:37.110,0:20:40.710 upper case, This might be better named[br]shouting. 0:20:40.710,0:20:44.070 Upper means give me an uppercase copy of[br]this variable. 0:20:44.070,0:20:49.730 So Hello Bob becomes HELLO BOB, and then[br]lower is hello bob, right? 0:20:49.730,0:20:55.920 So these are both ways to get copies of[br]uppercase and lowercase versions. 0:20:55.920,0:20:58.438 You might think these are kind of silly,[br]but one of the things 0:20:58.438,0:21:01.450 that you tend to use lower for is if[br]you're doing searching and 0:21:01.450,0:21:03.700 you want to ignore case, you convert the[br]whole thing 0:21:03.700,0:21:06.382 to lowercase, and then you search for a[br]lowercase string. 0:21:06.382,0:21:08.712 So you, depends on if you want to ignore[br]case or not. 0:21:08.712,0:21:11.720 So that's, that's one of the reasons that[br]you have things like this. 0:21:14.280,0:21:19.224 There is a replace function.[br]Again, it doesn't change the value. 0:21:19.224,0:21:21.640 Greet is going to have Hello Bob. 0:21:21.640,0:21:28.350 And I'm going to say, greet.replace all[br]occurrences of Bob with Jane. 0:21:28.350,0:21:32.660 That gives me back a copy, in nstr, says[br]Hello Jane. 0:21:32.660,0:21:35.690 So, so greet is unchanged. 0:21:35.690,0:21:39.890 This replace says, make a copy and then[br]make that following 0:21:39.890,0:21:43.251 edit that you, that, that we've requested. 0:21:43.251,0:21:46.447 [COUGH] Now we can also say, well, I[br]mean, the replace 0:21:46.447,0:21:50.490 is going to do all occurrences, so greet[br]is still Hello Bob. 0:21:50.490,0:21:51.660 This is kind of redundant here. 0:21:51.660,0:21:53.980 I'm just doing it so you remember what it is. 0:21:53.980,0:21:55.310 Greet is still Hello Bob. 0:21:55.310,0:21:57.500 I put Hello Bob back in it and replace 0:21:57.500,0:22:00.850 all the occurrences of lowercase o with[br]uppercase X. 0:22:01.920,0:22:05.096 And then that happens.[br]So this says, 0:22:05.096,0:22:11.927 go through the whole string [SOUND] doing[br]all those replaces, okay? 0:22:11.927,0:22:14.237 Now another common thing that we're[br]going to have to do 0:22:14.237,0:22:16.901 is we're going to have to throw away[br]whitespace. 0:22:16.901,0:22:18.628 Sometimes you have a string that 0:22:18.629,0:22:21.893 has characters, blank characters, or other[br]characters, 0:22:21.893,0:22:26.328 at the beginning and the end, nonprintable[br]characters, and we can strip them. 0:22:26.328,0:22:30.458 And there's three charact, three functions[br]that are built into 0:22:30.458,0:22:32.840 to Python strings that do this for us. 0:22:33.920,0:22:38.202 There is lstrip, which strips from the left. 0:22:38.202,0:22:43.675 There is rstrip, which strips from the right. 0:22:43.675,0:22:47.440 So it throws away these whitespaces, so,[br]Hello Bob is gone. 0:22:48.470,0:22:50.940 I mean, the, so it gets rid of these[br]characters. 0:22:50.940,0:22:53.373 Oops, these are the ones that are gotten[br]rid of there. 0:22:53.373,0:22:55.913 I need an eraser.[br]And then 0:22:55.913,0:22:59.313 let's use white, and then strip[br]basically, gets rid of 0:22:59.313,0:23:03.250 all the whitespace, both on the left and[br]the right side. 0:23:03.250,0:23:04.140 And gets rid of that. 0:23:04.140,0:23:07.010 So we're going to, we're going to be using[br]these a lot. 0:23:07.010,0:23:09.860 It, one of the things you tend to do in[br]Python is cleaning up data. 0:23:09.860,0:23:11.790 Sometimes if you have spaces at the[br]beginning or 0:23:11.790,0:23:13.960 the end, you just want to kind of ignore[br]them. 0:23:13.960,0:23:15.790 So you strip them off, you throw them[br]away. 0:23:18.020,0:23:22.130 When we're looking for data, we sometimes[br]are looking for a prefix, and 0:23:22.130,0:23:27.400 there is a startswith function [COUGH][br]that gives you a true or a false. 0:23:27.400,0:23:31.370 We're asking here, does this variable line[br]start with the string Please. 0:23:31.370,0:23:34.820 And the answer is True, because it does[br]start with the string Please. 0:23:34.820,0:23:38.290 Or, and then next, we ask, does this start[br]with the letter p? 0:23:38.290,0:23:41.060 And the answer is False, it does not start[br]with the letter p. 0:23:42.070,0:23:43.290 Okay? So there's 0:23:43.290,0:23:44.880 lots more of these things. 0:23:48.480,0:23:52.704 And reading data and tearing it apart is[br]one of the things that we're going to 0:23:52.704,0:23:57.296 really focus on for the rest of these[br]first few chapters of the book, okay? 0:23:57.296,0:24:00.041 Because that's one thing that Python's[br]really good at is 0:24:00.041,0:24:03.860 tearing data into pieces and pulling the[br]pieces that you want. 0:24:03.860,0:24:06.840 So, so let's take a look at this line. 0:24:06.840,0:24:11.455 So this line that we've got here is a line[br]from an actual email box. 0:24:11.455,0:24:13.550 This is what, if you 0:24:13.550,0:24:15.580 looked at your email, sort of, on your hard 0:24:15.580,0:24:18.710 drive, email boxes would have this kind of[br]a format. 0:24:18.710,0:24:23.870 And there's actually many lines, and soon[br]we'll reading whole files full of email. 0:24:23.870,0:24:26.940 But for now, let's just say we've got this[br]one line, somehow. 0:24:26.940,0:24:29.400 And we're looking for, we don't know[br]how long 0:24:29.400,0:24:31.910 these things are going to be, the first[br]charac, the 0:24:31.910,0:24:34.520 first thing is from, then there's an[br]email address, 0:24:34.520,0:24:38.000 then there's some detail about when the[br]mail was sent. 0:24:38.000,0:24:40.550 But what we actually want is 0:24:40.550,0:24:42.450 we want this part right here, 0:24:42.450,0:24:45.910 and that's the domain name of the mail[br]address, right? 0:24:45.910,0:24:48.110 We want to extract this out. 0:24:48.110,0:24:52.780 We're faced with this line, in a variable,[br]and we want to extract that out. 0:24:52.780,0:24:55.680 So this is kind of putting all these[br]things together. 0:24:55.680,0:24:59.330 So let's walk through how we do this. 0:24:59.330,0:25:02.028 So, here's this line, and it's a big long[br]string. 0:25:02.028,0:25:03.950 Mostly we would've read this from a file, 0:25:03.950,0:25:05.870 rather than just put it in a constant, but[br]for now we 0:25:05.870,0:25:08.480 put it in a constant, because we, files is[br]the next chapter. 0:25:09.950,0:25:12.500 And so what we're going to do is we're[br]going to say, you 0:25:12.500,0:25:15.380 know what, I'm going to look at this line[br]and I'm going to go 0:25:15.380,0:25:18.048 find the @ sign, and I want to know where[br]the @ sign is. 0:25:18.048,0:25:24.330 So I call data.find @ sign, and put[br]the result in atpos. 0:25:24.330,0:25:26.514 And that gives me 21. 0:25:26.514,0:25:29.166 It hunts until it finds the @ sign, and 0:25:29.166,0:25:34.310 then tells me where I found it.[br]Then what I want to look at is, starting 0:25:34.310,0:25:39.200 here, for the rest of the string, I want[br]to find the first space afterwards. 0:25:40.250,0:25:45.868 So what I say is, this, sppos is my[br]variable for the position of the space, 0:25:45.868,0:25:51.132 data.find, a blank, starting[br]at the @. 0:25:51.132,0:25:54.216 So this is starting at 21.[br]So it says, I'll start 0:25:54.216,0:25:59.523 at 21 and I'll look for the next blank.[br]And I find that at 31. 0:25:59.523,0:26:05.350 So now I know where the @ sign is and I[br]know where the space is. 0:26:05.350,0:26:08.172 And so what I'm looking at is, I want the[br]stuff 0:26:08.172,0:26:14.186 one beyond the @ sign, up to but not[br]including the space. 0:26:14.186,0:26:20.142 So then I can use a slicing operation, I[br]can use a slicing operation. 0:26:20.142,0:26:22.650 Start at the @ position, add 1 to it, 0:26:22.650,0:26:26.480 so advance 1, that's going to be the[br]letter u. 0:26:26.480,0:26:30.730 And then a slicing operation, up to but[br]not including space. 0:26:30.730,0:26:36.190 Up to, this is going to work out nicely[br]all of a sudden, but not 0:26:36.190,0:26:41.770 including, okay?[br]And then 0:26:41.770,0:26:45.796 I'm going to take that slice, which is[br]really this little bit of data right here, 0:26:45.796,0:26:49.500 take that slice, and put in the variable[br]host. 0:26:49.500,0:26:53.844 Then we print that out and we get the[br]piece, okay? 0:26:53.844,0:26:56.980 And so, here we have some data we want to[br]tear apart. 0:26:56.980,0:26:58.230 We hunt for the @. 0:26:58.230,0:27:00.281 We find it at position 21. 0:27:00.281,0:27:04.598 We start at 21 and we look for the, the[br]space after that. 0:27:04.598,0:27:10.659 31, and then we pull from 22, up to but[br]not including, 31. 0:27:10.659,0:27:13.380 And it, it wouldn't matter where this[br]thing was, because these aren't all 0:27:13.380,0:27:17.491 the same length when we start looking at[br]them in files, but it 0:27:17.491,0:27:20.541 would have found the @ sign and the space[br]after the @ sign, 0:27:20.541,0:27:24.258 and it would have reliably[br]pulled out the host, okay? 0:27:24.258,0:27:29.646 So this is a basic pattern we call[br]parsing. 0:27:29.646,0:27:32.068 Parsing text. 0:27:32.068,0:27:35.620 Find this, find that other thing, grab[br]this thing out, 0:27:35.620,0:27:40.040 then look inside that thing and [SOUND].[br]So it does all these things, right? 0:27:40.040,0:27:45.430 So, that's kind of like strings.[br]Up next, we have files. 0:27:45.430,0:27:46.770 Files are going to be lots of strings. 0:27:46.770,0:27:49.320 So we're going to start putting all these[br]things together. 0:27:49.320,0:27:52.490 And and so the next chapter is a really,[br]really 0:27:52.490,0:27:55.600 important chapter, where it starts to[br]really start coming together. 0:27:55.600,0:27:57.110 So see you soon.