Welcome to the playlist on statistics. Something I've been meaning to do for some time. So anyway, I just want to get right into the meat of it and I'll try to do as many examples as possible and hopefully give you the feel for what statistics is all about. And, really, just to kind of start off in case you're not familiar with it -- although, I think a lot of people have an intuitive feel for what statistics is about. And essentially -- well in very general terms it's kind of getting your head around data. And it can broadly be classified. Well there are maybe three categories. You have descriptive. So say you have a lot of data and you wanted to tell someone about it without giving them all of the data. Maybe you can kind of find indicative numbers that somehow represent all of that data without having to go over all of the data. That would be descriptive statistics. There's also predictive. Well, I'll kind of group them together. There's inferential statistics. And this is when you use data to essentially make conclusions about things. So let's say you've sampled some data from a population -- and we'll talk a lot about samples versus populations but I think you have just a basic sense of what that is, right? If I survey three people who are going to vote for president, I clearly haven't surveyed the entire population. I've surveyed a sample. But what inferential statistics are all about are if we can do some math on the samples, maybe we can make inferences or conclusions about the population as a whole. Well, anyway, that's just a big picture of what statistics is all about. Let's just get into the meat of it and we'll start with the descriptive. So the first thing that, I don't know, that I would want to do or I think most people would want to do when they are given a whole set of numbers in they're told to describe it. Well, maybe I can come up with some number that is most indicative of all of the numbers in that set. Or some number that represents, kind of, the central tendency -- this is a word you'll see a lot in statistics books. The central tendency of a set of numbers. And this is also called the average. And I'll be a little bit more exact here than I normally am with the word "average." When I talk about it in this context, it just means that the average is a number that somehow is giving us a sense of the central tendency. Or maybe a number that is most representative of a set. And I know that sounds all very abstract but let's do a couple of examples. So there's a bunch of ways that you can actually measure the central tendency or the average of a set of numbers. And you've probably seen these before. They are the mean. And actually, there's types of means but we'll stick with the arithmetic mean. geometric means and maybe we'll cover the harmonic mean one day. There's a mean, the median, and the mode. And in statistics speak, these all can kind of be representative of a data sets or population central tendency or a sample central tendency. And they all are collectively -- they can all be forms of an average. And I think when we see examples, it'll make a little bit more sense. In every day speak, when people talk about an average, I think you've already computed averages in your life, they're usually talking about the arithmetic mean. So normally when someone says, "Let's take the average of these numbers." And they expect you to do something, they want you to figure out the arithmetic mean. They don't want you figure out the median or the mode. But before we go any further, let's figure out what these things are. Let me make up a set of numbers. Let's say I have the number 1. Let's say I have another 1, a 2, a 3. Let's say I have a 4. That's good enough. We just want a simple example. So the mean or the arithmetic mean is probably what you're most familiar with when people talk about average. And that's essentially -- you add up all the numbers and you divide by the numbers that there are. So in this case, it would be 1 plus 1 plus 2 plus 3 plus 4. And you're going to divide by one, two, three, four, five numbers. It's what? 1 plus 1 is 2. 2 plus 2 is 4. 4 plus 3 is 7. 7 plus 4 is 11. So this is equal to 11/5. That's what? That's 2 1/5? So that's equal to 2.2. And so someone could say, "Hey, you know. That is a pretty good representative number of this set. That's the number that all of these numbers you can kind of say are closest to." Or, 2.2 represents the central tendency of this set. And in common speak, that would be the average. But if we're being a little bit more particular, this is the arithmetic mean of this set of numbers. And you see it kind of represents them. If I didn't want to give you the list of five numbers, I could say, "Well, you know, I have a set of five numbers and their mean is 2.2." It kind of tells you a little bit of at least, you know, where the numbers are. We'll talk a little bit more about how do you know how far the numbers are from that mean in probably the next video. So that's one measure. Another measure, instead of averaging it in this way, you can average it by putting the numbers in order, which I actually already did. So let's just write them down in order again. 1, 1, 2, 3, 4. And you just take the middle number. So let's see, there's one, two, three, four, five numbers. So the middle number's going to be right here, right? The middle number is 2. There's two numbers greater than 2 and there's two numbers less than 2. And this is called the median. So it's actually very little computation. You just have to essentially sort the numbers. And then you find whatever number where you have an equal number greater than or less than that number. So the median of this set is 2. And you see, I mean, that's actually fairly close to the mean. And there's no right answer. One of these isn't a better answer for the average. They're just different ways of measuring the average. So here it's the median. And I know what you might be thinking. "Well, that was easy enough when we had five numbers. What if we had six numbers?" What if it was like this? What if this was our set of numbers? 1, 1, 2, 3, let's add another 4 there. So now, there's no middle number, right? I mean 2 is not the middle number because there's two less than and three larger than it. And then 3's not the middle number because there's three larger and -- sorry, there's two larger and three smaller than it. So there's no middle number. So when you have a set with even numbers and someone tells you to figure out the median, what you do is you take the middle two numbers and then you take the arithmetic mean of those two numbers. So in this case of this set, the median would be 2.5. Fair enough. But let's put this aside because I want to compare the median and the means and the modes for the same set of numbers. But that's a good thing to know because sometimes it can be a little confusing. And these are all definitions. These are all kind of mathematical tools for getting our heads around numbers. It's not like one day someone saw one of these formulas on the face of the sun and says, "Oh, that's part of the universe that this is how the average should be calculated." These are human constructs to kind of just get our heads around large sets of data. This isn't a large set of data, but instead of five numbers, if we had five million numbers, you can imagine if you don't like thinking about every number individually. Anyway, before I talk more about that, let me tell you what the mode is. And the mode to some degree, it's the one I think most people probably forget or never learn and when they see it on an exam, it confuses them because they're like, "Oh, that sounds very advanced." But in some ways, it is the easiest of all of the measures of central tendency or of average. The mode is essentially what number is most common in a set. So in this example, there's two 1's and then there's one of everything else, right? So the mode here is 1. So mode is the most common number. And then you could kind of say, "Whoa, hey Sal, what if this was our set? 1, 1, 2, 3, 4, 4." Here I have two 1's and I have two 4's. And this is where the mode gets a little bit tricky because either of these would have been a decent answer for the mode. You could have actually said the mode of this is 1 or the mode of this is 4 and it gets a little bit ambiguous. And you probably want a little clarity from the person asking you. Most times on a test when they ask you, there's not going to be this ambiguity. There will be a most common number in the set. So now it's like oh, well you know, why wasn't just one of these good enough? You know why we learned averages, why don't we just use averages? Or why don't we use arithmetic mean all the time? What's median and mode good for? Well, I'll try to do one example of that and see if it rings true with you. And then you can think a little bit more. Let's say I had this set of numbers. 3, 3, 3, 3, 3, and, I don't know, 100. So what's the arithmetic mean here? I have one, two, three, four, five 3's and 100. So it would be 115 divided by 6, right? I could have one, two, three, four, five, six numbers. 115 is just the sum of all of these. So that's equal to -- how many times does 6 go into 115? 6 goes into it one time. 1 times 6 is 6. 55 goes into it 9 times. 9 times 6 is 54. So it's equal to 19 1/6. Fair enough. I just added all the numbers and divided by how many there are. But my question is, is this really representative of this set? I mean, I have a ton of 3's and then I have 100 all of a sudden, and we're saying that the central tendency is 19 1/6. And, I mean, 19 1/6 doesn't really seem indicative of the set. I mean maybe it does, depending on your application, but it just seems a little bit off, right? I mean, my intuition would be that the central tendency is something closer to 3 because there's a lot of 3's here. So what would the median tell us? I already put these numbers in order, right? If I give it to you out of order, you'd want to put it in this order and you'd say what's the middle number? Let's see, the middle two numbers, since I have an even number, are 3 and 3. So if I take the average of 3 and 3 -- or I should be particular with my language. If I take the arithmetic mean of 3 and 3, I get 3. And this is maybe a better measurement of the central tendency or of the average of this set of numbers, right? Essentially, what it does is by taking the median, I wasn't so much affected by this really large number that's very different than the others. In statistics they call that an outlier. A number that, you know, if you talked about average home prices, maybe every house in the city is $100,000 and then there's one house that costs $1 trillion. And then if someone told you the average house price was, I don't know, $1 million, you might have a very wrong perception of that city. But the median house price would be $100,000 and you get a better sense of what the houses in that city are like. So similarly, this median, maybe, gives you a better sense of what the numbers in this set are like. Because the arithmetic mean was skewed by this, what they call an outlier. And being able to tell what an outlier is, it's kind of one of those things that a statistician will say, well, I know it when I see it. There isn't really a formal definition for it but it tends to be a number that really kind of sticks out and sometimes it's due to, you know, a measurement error or whatever. And then finally, the mode. What is the most common number in this set? Well there's five 3's and there's 100. So the most common number is, once again, it's a 3. So in this case, when you had this outlier, the median and the mode tend to be, you know, maybe they're a little bit better about giving you an indication of what these numbers represent. Maybe this was just a measurement error. But I don't know, we don't actually know what these represent. If these are house prices, then I would argue that these are probably more indicative measures of what the houses in a area cost. But if this is something else, if this is scores on a test, maybe, you know, maybe there is one kid in the class -- one out of six kids who did really, really well and everyone else didn't study. And this is more indicative of, kind of, how students at that level do on average. Anyway, I'm done talking about all of this. And I encourage you to play with a lot of numbers and deal with the concepts yourself. In the next video, we'll explore more descriptive statistics. Instead of talking about the central tendency, we'll talk about how spread apart things are away from the central tendency. See you in the next video.