Hello, and welcome to Chapter Eight: Python Lists. So now we're sort of going to start taking care of business. We are doing, make lists and dictionaries and tuples and really start manipulating this data, and doing real data analysis, starting the, laying the proper work for real data analysis. As always, these lectures, audio, video, slides, and even book are copyright Creative Commons Attribution. So, lists, dictionaries, and tuples, the next real three big topics we're going to talk about, are collections. And we've been doing lists already, right? We've been doing lists when we were doing for loops. A list in Python is something that has a square braces. This is a constant list. Now, when I first talked to you about variables, I sort of oversimplified things. I said if you put like x equals two, and then put x equals four, the two and the four overwrite each other. A collection is where you can put a bunch of things in the same variable. Now, I have to have a way to find those things. But it allows us to put multiple things in more, more things, more than one thing in the variable. So, here we have friends, that has three strings, Joseph, Glenn, and Sally. And we have carryon that has socks, shirt, and perfume. So that's the basic idea. So what's not a collection? Well, simple variables. Simple variables are not collections, just like this example. I say x equals 2, x equals 4, and print x, and the 4's in there and the 2 is somehow gone. It was there for a moment, and then it's gone. And so that's a normal variable. They're not collections. You can't put more than one thing in it. But when you put more than one thing in it, then you have to have a way to find the things that are in there. We'll, we'll get to that. So, we've been using list constants for the last couple of chapters just because we have to use list constants. You know, so we used, in the for loop chapter, we did lists of numbers. We have done lists of strings, that's strings, red, yellow, and blue. And you don't have to necessarily, you don't necessarily have to have things all of the same type. This is a three-item list, that has a string red, the number integer 24, and 98.6, which is a floating point number. And here's an interesting thing, just as a side note. This shows that floating point numbers are not always perfectly represented inside of the computer. It's sort of an artifact of how they work. And this is an example of 98.6 is really 98 point na, na, na, na, na. So, but, don't, when you see something like that, don't freak out. Floating point numbers are the ones that show this behavior. So, interestingly, you can always, although we won't put a lot of energy into this, you can also have an element of a list be a list itself. So this a outer list that's got three elements. 1, 7, and then a list that's 5 and 6. So, if you look at the length of this, there is three things in it. Not four, three. Because the outer list has 1, 2, 3 things in it. And an empty list is bracket, bracket. Okay? Like I said, we have been going through lists all along. We have iteration variables for i in. This is a list. We've been using it all along. Similarly, we've been using lists in definite loops, are a great way to go through lists, for friend in friends, there we have goes through three times, out come three lines, with the variable friend advancing through the three successive items in the list. And away we go. So, again, lists are not completely foreign to us. Now, just like in a string, we can use the index operator, the square bracket operator, and we can look up items in the list. Sub one, friends, sub one. Not surprisingly, using the European elevator rule, the first item in a list is sub zero, the second item is sub one and the third one is sub two. So here when I print friends sub one I get Glenn. Which is the second element. Just like strings. So once you kind of know it for strings, lists and the rest of these things make a lot more sense. Just, remember that we're in Europe, and things start with zero. Some things in these data items that we work with are not mutable. So for example, strings, when we ask for a lower case version of a string, we're given a copy of that string. And that's because strings are not mutable, and we can see this by doing something like saying fruit sub 0 equals lowercase b. Now you'd think that that would just change this to be a lower case b, but it doesn't, okay? It says string object does not support item assignment which means that you're not allowed to reassign. You can make a new string and put different things in that new string, but once the strings are made, they're not changeable. And that's why when we call fruit.lower, we get a copy of it in lower case. And so x is a copy of the original string, but the original string, once we assign it into fruit, is unchanged. It can't be changed. Lists, on the other hand, can be changed, and we can change them in the middle. This is one of the things we like about them. So here we have a list: 2, 14, 26, 41, and 63. Then we say lotto sub two. Of course, that's going to be the third item. Lotto sub two is equal to 28. Then we print it and we see the new number there. So all this is saying is that we can change them, right? Strings no, and lists yes. You can change lists, but you can't change strings. So the len function, we've used it for several things, we can say you know, use, len is used for, for strings and it's used for lists as well. So the same function knows when its parameter is a string. And when its parameter is a string, it gives us the number of characters in the string. And when it is a list, it gives us the number of elements in the list. And just because one of them is a string, it's still one element from the point of view of this list. So it has one, two, three, four - four items in the list, okay? So, the range function is a special function. It's probably about time to talk about the range function. The range function is a function that generates a list, that produces a list and gives it back to us. And so you give the range function a parameter, how many items you want, and the range function creates and gives us back a list that is four numbers starting at zero, which is zero up to, but not including three. Sound familiar? Yeah. Zero up to but not, I mean zero up to, but not including four. And, and so the same thing is true here. So, we can combine the len and the range to say, you know, to say okay, well len friends, that's three items, and range len friends is 0, 1, 2. And it also corresponds exactly to these items. So we can actually use this to construct loops to go through a list. We already have a basic for loop, right? We basically have a for loop that is our, that, that said that for each friend in friends. And out comes, Happy New Year, Glenn and Joseph. If we also want to know where, what position we're at as the loop progresses, we can rewrite the exact same loop a different way. And make i be our iteration variable. And say i in range(len(friends)), that turns this into zero, one, two. And then i goes zero, one, two. And then, we can in the loop, look up the particular friend that is the particular one we are interested in, using the index operator, friend sub i. And then print Happy New Year. So these two loops, these two loops are equivalent. These, oop, not that one. [SOUND] This loop and this loop. This loop is preferred, unless you happen to need this value i, which tells you where you're at. In case maybe you're going to change something, you're going to look through something and then change it. So, but, but, for what I've written here, they're exactly equivalent. Prefer the simpler one, unless you need the more complex one. They both produce the same kind of output. We can concatenate lists, much like we concatenate strings, with plus. And you can think of the Python operator's looking to its right and to its left and saying oh, those are both lists, I know what to do with lists, I'm going to put those together. And so that produces a two, three-long lists become a six-long list with the first one followed by the second one concatenated. It didn't hurt the original, a. c is a new list, basically. We can also slice lists. Feels a lot like strings, right? Everything's kind of like strings. For loops like strings, concatenation like strings, and now slicing like strings. And it is exactly the same. So one up to, but not including. Just remember, up to, but not including. the second parameter, is up to but not including, so that starts at the sub one, which is the second one up to but not including 3, the third one, so. This is 1, 2, and 3 so that's 41 comma 2. Starting at the first one, up to but not including the third one. We can similarly eliminate the first one, so that's up to but not including the fourth one. Starting at zero, one, two, three, but not including four. So that's this one. If we go three to the end, and again, remember that there, starting at 0, so 3 to the end is 0, 1, 2, 3 to the end. The number 3 doesn't matter. So that's 3, 74, 15. And the whole thing, that's the whole thing, so these two things are the same. So slicing works like strings, starting and up to but not including is the second parameter. There are some methods, and you can read about these online in the Python documentation. We can use the built-in function. It doesn't have a lot of use in sort of how we run, when we're running programs but it's kind of of useful. I like it when I'm typing interactively. Like, what can this thing do? So I make a list, list is a unique type, and I say, with dir I say what can we do with it? Well, we can append, we can count, extend, index, insert, pop, remove, reverse and sort. And then you can sort of read up on all these things. I'll show you just a couple. We can build a list with the append. So this syntax here, stuff equals list, that's called a constructor which says give me an empty list. You could also say bracket, bracket for an empty list. Whatever, you gotta make an empty list and then you call the append. Remember that lists are mutable, so it's okay to change it. So we're saying, okay, we started with an empty list. Now append to the end of that, the word book. And then append to that, 99. Wait a sec. That's a mistake. That's a mistake. So I have to fix this mistake. So watch me fix the mistake. Poof. Now my thing is magically fixed. Isn't that amazing. I have magic powers when it comes to slide fixing. I just snap my fingers and the slides are fixed. So here we go. We append the 99, and we print it out. And it's got book and 99, emphasizing the fact that they don't have to be the exact same kind of thing in a list. Then later we append cookie and then it's book, 99, cookie. Okay? So this append, we won't do it in line like this so often, we'll tend to do it in a loop as we're building up a list, but that's the way you start with an empty list and then [SOUND] programmatically grow it. We can ask, much like we do in a string, we can ask if an item is in a list. So here is a list called some, with these numbers in it. It's got five numbers in it. Is nine in some? True, yes it is. Is 15 in some? False. Is 20 not in, that's a leg, a legal syntax, that is legal syntax. Is 20 not in some, yes it's not there, okay? They don't modify the list, don't modify the list, they're just asking questions. These are logical operations often used in if statements or while, some kind of a logic that you might be building. Okay, so lists have order. So when we were appending them, the first thing went in first, the second thing went in second, et cetera, et cetera. And we can also tell the list to sort itself. So one of the things that we can do with a list, now we're starting to see some power here, is say, sort yourself. This is a list of strings. It can sort numbers, it can sort lots of things. friends.sort, that says hey there, dear friends, sort yourself. This makes a change. It alters the list, and puts it, in this case, in alphabetical order, Glenn, Joseph, and Sally. It is muted, it was, it's, it's been modified, and so friend sub one is now Joseph because that's the second one. Okay? So the sort method says sort yourself now, sort yourself, and it sorts and then it stays sorted. So [COUGH] you're going to be kind of ticked about this particular slide. Because there's a whole bunch of built-in functions that help with lists. And, there's max, there's min, there's len, various things. And so we could, all those loops that I told you how to do, I was just showing you that stuff because I thought it was important. This the simplest way to go through and find the largest, smallest, and sum, et cetera. So here's a list of numbers. We can say how many are there. That's the count. We can say what's the largest, it's 74. What's the smallest, that'd be 3. What is the sum of the running total of them all? 154. If you remember from a few lectures ago, these are the same numbers. And what is the average, which is, sum of them over the length of them, Okay? So this makes a lot more sense and if you had a list of numbers like this, you would simply say what's the max, you wouldn't write a max loop. I just did that to kind of demonstrate how loops work. [COUGH] Demonstrate how loops work. So here is a way that you can sort of change those kind of programs that we wrote. So there's two ways to write a summing program. Let's just say instead of the data being in a list, we're going to write a while loop that's going to read a set of numbers until we say done, and then compute the average of those numbers. Okay, so let's say this is our problem. Read a list of numbers, wait till the word done comes in, and then average them. So here's a little program that does that. We create total equals zero, count equals zero. Make a infinite loop with while True. And then we ask to enter a number. We get a string back from this, remember raw_input always gives us strings back, and then if it's done, we're going to break. This is the version of the if that does not require an indent. We just put the break up there. And so that gets us out of the loop when the time is right. So when the time is right over here. And then, we convert the value to float. We use a float to convert the input to a floating point number. And then we do our accumulation pattern, total equals total plus value, count equals count plus one. So this is going to run. These numbers are going to go up and up and up and up. And then we're going to break out of it, calculate the average, and then print the average. Because that's a floating point number, so now the average is a floating point number. So that's one way to do it. Right? That would be one way to write a program that does an average, is keep a running average as you're reading the numbers. But there's another way to do it, that would exact, work exactly the same way, and this is when you can start using lists. So you come in, you say I'm going to make a list of numbers, just a mnemonic name, numlist, is an empty list. Then I create another infinite loop that's going to read for enter a number. And if it's done, break. That gets us out of it. Convert the value to an int. Convert the value to a float, the input value to a float. And then append it to the list. So now the list is going to grow, each time we read a number the list is going to grow. However many times we add the number is how many things are going to be in the list. So in this case, when we're at this point and we type done, there will be three numbers in the list, because we will have run append three times. We'll have appended 3, 9, and 5. We'll have them sitting in a list. And we will have exited the loop. So now you say, oh add up all the numbers in that list, and then divide it by the length of the list. And print the average. So these two programs are basically equivalent. The only time that they might not be equivalent was if there was ten million numbers. This would use up 40 megabytes of your memory, which is actually not a lot of memory on some computers. But if memory mattered, this does store all those numbers. This one actually just runs the calculation. So if there's a really large number of numbers, this would make a difference, because the list is growing and keeping them all, summing them all at the end. This is actually storing very little data. But for reasonably sized numbers, like thousands or even hundreds of thousands of numbers, these two approaches are kind of equivalent. And then sometimes you actually want to accumulate something a little more complex than this, you want to sort them or look for the maximum and look for something else. Who knows what, but the notion of make a list and then append something to the list each time through the iteration, and then do something with the list at the end is a rather powerful pattern. So this is also a powerful pattern, this is accumulator pattern where we just have the variables accumulating in the loop. This one is one where we accumulate the data in the loop and then do the computations all at the end. The, certain situations will make use of these different techniques. Okay. So, connecting strings and lists. So there's a method, a capability of strings that is really powerful when it comes to tearing data apart. It's called the split. So here is a string with three words and it has blanks in between here. And abc.split says parse this string, look for the blanks, break the string into pieces, and give me back a list with one item for each of the words in the list as defined by the spaces. Okay? So, it takes, breaks it into three pieces and gives us that back in a list. This is very powerful. Okay? So we're going to split it and we get back a list. There are three words, and the first word, stuff sub zero, is With. So there's a lot of parsing going on here. We could do this with for loops and a lot of other things. There would be a lot of work in this split. Given that this is a really common task, it's really great that this has been put into Python for us. Okay? So split breaks a string into parts and produces a list of strings. We think of these as words, we can access a particular word or we can loop through all the words. So here we have stuff again and now we have a, a for loop for each of the, that's going to go through each of the three words. And then it's going to run three times. Now chances are good we're going to do something different other than just print them out. But you see how that you quickly can take a split followed by a for, and then write a loop that's going to go through each of the words, without working too hard to find the spaces. You let Python do all the hard work of finding the spaces. Okay? So let's take a look at a couple of samples. Just a couple of things to teach you a little more about split. Split looks at many spaces as equal to one space. So, if you split a lot blank, blank, blank of spaces, it's still just throws away all the spaces and gives us four words. One, two, three, four and throws away all the spaces, because it assumes that's what we want done. So that's nice. You can also have split, you can also have split, split on some other character. Sometimes you'll be getting data and they'll have used a semicolon, or a comma, or a colon, or a tab character, who knows what they've used, and your job is to dig that data out. So you can split, based on the different character. Here, if we're splitting normally with, with this is a normal split. It's not going to see the semicolons, it's looking for a space. And so all we get back is one item in the string, with the semicolons. But, if we switch, and we pass semicolon as a parameter, in as as parameter to split, then it will know to split it based on semicolons, and gives us first, second, and third back. Okay? And then it gives us three words. So you can split either on spaces, or you can split on a character other than a space. Okay? [COUGH] So, let's take a look at how we might turn this into some of our common assignments that we have in this chapter, where we're going to read some of the mailbox data. Okay? So, here we go with a little program. First three lines, we write these a lot. Open the file. Write a for loop to loop through each line in the file. Then we're going to strip off the white space at the end of the line. One, two, three. Do those all the time. And we're looking for lines, if you look at the whole file, we're looking for lines that start with from, followed by a space. So if the line does not start with from followed by a space, that's a space right there, continue. So that's a way to skip all the lines that don't look like this. There're thousands of lines in this file and just a few that look like this. Okay? So we're going to look and we're going to try to find what day of the week this thing happened on. So, so we're throwing away all the lines with this little bit of code. Then what we do is we take the line, which is all of this text, and then we split it. And we know that the day of the week is words sub two. So this is words sub zero, this is words sub one, and this is words sub two. So this is words sub zero, sub one, and sub two. And so, all we have to do is print out the sub two and we get, we throw away all the lines except the from lines. We split them and take the sec, uh, the, the third word or words sub two and we can quickly quickly create something that's extracting the day of the week out of these. Okay? So it's, it's, I mean, it's quick, because split does the tricky work. If you go back to the strings chapter, you saw that we did a lot of work to get this to happen. So here's even another tricky pattern. So let's say we want to do what we did at the end of Chapter Six, the string chapter. Let's say we wanted to get back this little bit of data. Okay? So, can look at this and say, okay, let's split this. And this will be zero, one, and two, and three, and four, and five, and six. We're splitting it based on spaces. Then the email address is words sub one, right? So that email address is this little bit of stuff because it's in between spaces, right? So that's what we pull out. The email address is words sub one. We've got that. So that's sitting in this email address variable. Then we really, all we want, we don't really want the whole thing, we just want the part after the at sign, and we can do a lookup for the, oop. We can do a lookup of the at sign. But you can also then do a second, come back, come back. [SOUND] There we come. You can also do a second split. Okay? So we're taking this variable here, email, which is merely this little part right here. And we are splitting it again, except this time we're splitting it based on a at sign. Which means it's going to bust it right here, and find us two pieces. So pieces now is a list where the sub zero item is the person's name and sub one item is the host that their mail address is held from. Okay? And so then all we need to know is pieces is sub one, and pieces sub one is this guy right here. So that's pieces sub one, and so we pulled it out. So if you go back to how we did it before, we were doing searching, we were searching some more, and then we were taking slices. This is a little more elegant, okay? Because really, we split it and then we split it, and we knew what piece we were looking at. So this is what I call the Double Split Pattern, where you split a string into a list, then you take a thing out, and then you split it again. Depending on what data you're looking for. This is just a technique, it's not the only technique. Okay, so that's lists. We talked about the concept of a collection where lists have multiple things in it. Definite loops, again, we've seen these things. We're kind of, it looks a lot like strings except the elements are more powerful and they're more mutable. We still use the bracket operator and we redid the max, min, and sum. Except we did it in, like, one line rather than a whole loop. And something we're going to play with a lot is using split to parse strings, the single split, and then the double split is the natural extension of the single split. So, see you in the next lecture, looking forward to talking about dictionaries.