Return to Video

Statistics: The Average

  • 0:02 - 0:07
    welcome to the playlist on statistics, something I've been needing to do for some time.
  • 0:07 - 0:13
    So I just want to get right into it, and I want to do as many examples as possible
  • 0:13 - 0:16
    and hopefully give you the feel for what statistics is all about
  • 0:16 - 0:18
    So I'm just gonna start off in case you're not familiar with it
  • 0:18 - 0:23
    I think a lot of people have an intuitive feel for what statistics is about
  • 0:27 - 0:29
    Well, in very general terms, it's kinf of getting your head around data
  • 0:29 - 0:33
    and it can oddly be classified in maby 3 categories
  • 0:33 - 0:36
    You have: Descripitive
  • 0:36 - 0:42
    So, say you have a lot of data, and you want to tell someone about it whitout giving him all the data
  • 0:42 - 0:48
    Maybe you can find indicative numbers that somehow represent the data
  • 0:48 - 0:49
    without having to go all over the data
  • 0:49 - 0:52
    So that would be "Descriptive statistics"
  • 0:52 - 1:00
    Well, I'll kind of group it together, there's "Inferential statistics"
  • 1:00 - 1:03
    So that's when you use data to make conclusions about things
  • 1:03 - 1:07
    So let's say you've sampled some data from a population,
  • 1:07 - 1:10
    and we'll talk a lot about sample versus population,
  • 1:10 - 1:12
    but I think you have just a basic sense of what that is,
  • 1:12 - 1:15
    right: if I survey 3 people who are going to vote for president,
  • 1:15 - 1:19
    I clearly haven't survey the entire population, I surveyed the sample.
  • 1:19 - 1:22
    But what inferential statistics are all about are:
  • 1:22 - 1:26
    "if we can make some maths on the sample, maybe we can make "inferences", or conclusions,
  • 1:26 - 1:28
    about the population as a whole.
  • 1:28 - 1:31
    Anyway, that's just a big picture of what statistics is all about,
  • 1:31 - 1:39
    so let's just get into the middle of it, and we'll start with the Descriptive
  • 1:39 - 1:44
    So the first thing that I would want to do, and I think most people would want to do
  • 1:44 - 1:48
    when they are given a whole set of numbers and they are told to describe it
  • 1:48 - 1:51
    they would think: "well, maybe I could come up with some number
  • 1:51 - 1:54
    that is most indicative of all of the numbers in that set
  • 1:54 - 1:59
    or some number that represents kind of the central tendency
  • 1:59 - 2:02
    this is a word you see a lot in statistics books
  • 2:02 - 2:08
    the "central tendency" of a set of numbers
  • 2:08 - 2:11
    And this is also called the average.
  • 2:11 - 2:15
    And I'll be a little more exact here that I normally am with the word average
  • 2:15 - 2:17
    when I talk about it in this context, it just means
  • 2:17 - 2:21
    that the average is one number, that somehow
  • 2:21 - 2:23
    is giving us a sense of the central tendency
  • 2:23 - 2:26
    or maybe a number that is most representative of a set
  • 2:26 - 2:28
    and I know that sounds all very abstract,
  • 2:28 - 2:30
    but let's do a couple of examples:
  • 2:30 - 2:33
    So there's a bunch of ways you can actually measure
  • 2:33 - 2:36
    the central tendency, or the average of a set of numbers
  • 2:36 - 2:39
    and you've probably seen this before,
  • 2:39 - 2:43
    they are the "mean"... Well actually there is types of means,
  • 2:43 - 2:49
    but let's stay with the arithmetic mean
  • 2:49 - 2:53
    And later on, we'll do geometric means,
  • 2:53 - 2:56
    and maybe we'll cover the harmonic mean one day
  • 2:56 - 3:00
    There is the "mean", the "median"
  • 3:00 - 3:03
    and the mode.
  • 3:03 - 3:06
    And in statistics speak, these are all
  • 3:06 - 3:11
    kind of representative of data set or a population central tendancy
  • 3:11 - 3:14
    or sample central tendency
  • 3:14 - 3:18
    And they are all, collectively, they can all be forms of an average
  • 3:18 - 3:21
    and I think when we'll see examples it will make a litlle more sense
  • 3:21 - 3:24
    In everyday speak, when people talk about an average,
  • 3:24 - 3:27
    And I think you've already computed averages in your life,
  • 3:27 - 3:29
    They are usually talking about the arithmetic mean
  • 3:29 - 3:31
    So normally some is asking you to calculate
  • 3:31 - 3:33
    the average of these numbers, and they are expecting you to do something,
  • 3:33 - 3:35
    they want you to figure out the arithmetic mean.
  • 3:35 - 3:38
    They don't want you to figure out the median or the mode.
  • 3:38 - 3:42
    But before we go any further, let's figure out what those things are.
  • 3:42 - 3:44
    So let me figure out a set of numbers.
  • 3:44 - 3:56
    So let's take...1...another 1, a 2, a 3, let's say I have a 4...
  • 3:56 - 3:59
    That's good enough. We just want a simple example.
  • 3:59 - 4:04
    So, the mean, or the arithmetic mean,
  • 4:04 - 4:07
    and what's probably what you're most familiar with
  • 4:07 - 4:09
    when people talk about average,
  • 4:09 - 4:10
    and that's essentially: you add up all the numbers,
  • 4:10 - 4:13
    and you divide by the number that they are.
  • 4:13 - 4:17
    So that's: 1+1+1+3+4
  • 4:17 - 4:22
    and you're going to divide by: 1,2,3,4..5 numbers
  • 4:22 - 4:30
    1+1 is 2, 2+2 is 4, 4+3 is 7, 7+4 is 11
  • 4:30 - 4:33
    So that's equal to 11 over 5
  • 4:33 - 4:39
    That's what? 2 and 1/5, so that's equal to 2.2
  • 4:39 - 4:42
    And some could say: "hey you know that's a pretty representative number
  • 4:42 - 4:45
    of this set, that's the number that all of this numbers,
  • 4:45 - 4:47
    you can kind of say, are closest to.
  • 4:47 - 4:50
    Or 2.2 is the central tendancy of this set"
  • 4:50 - 4:52
    And in common speak, that would be "the average".
  • 4:52 - 4:54
    But if we want to be a little bit more particular,
  • 4:54 - 4:56
    this would be the arithmetic mean of this set of number that you see
  • 4:56 - 4:58
    it kind of represents them.
  • 4:58 - 5:00
    If I didn't want to give you the list of 5 numbers,
  • 5:00 - 5:01
    I could say: "Well you know I have a set of 5 numbers,
  • 5:01 - 5:03
    and their mean is 2.2"
  • 5:03 - 5:05
    and it can of tells you a little bit of, at least, where the numbers are
  • 5:05 - 5:10
    Right, we'll tell a bit more about how far the numbers are from that mean
  • 5:10 - 5:13
    in probably the next video.
  • 5:13 - 5:15
    So that's one measure.
  • 5:15 - 5:17
    Another measure, instead of averaging it this way,
  • 5:17 - 5:19
    you can average it by: putting the numbers in order,
  • 5:19 - 5:21
    which I actually already did,
  • 5:21 - 5:25
    so let's just right them down in order again.
  • 5:25 - 5:27
    1,1,2,3,4
  • 5:27 - 5:29
    And you just take the middle number!
  • 5:29 - 5:32
    So let's see: there's 1,2,3,4...5 numbers,
  • 5:32 - 5:34
    so the middle number is gonna be right here, right?
  • 5:34 - 5:36
    The middle number is 2
  • 5:36 - 5:37
    There is two numbers greater than 2,
  • 5:37 - 5:39
    and there is two numbers less than 2.
  • 5:39 - 5:41
    And this is called the median,
  • 5:41 - 5:42
    which is actually very simple computation:
  • 5:42 - 5:44
    you just have to sort the numbers,
  • 5:44 - 5:45
    and then you find whatever number
  • 5:45 - 5:48
    where you have an equal number of "greater than", or "less than that number"
  • 5:48 - 5:50
    So the median of this set is 2.
  • 5:50 - 5:55
    And you see, that's fairly close to the mean.
  • 5:55 - 5:57
    And, there is no right answer.
  • 5:57 - 5:59
    One of these isn't a better answer for the average,
  • 5:59 - 6:02
    they are just different ways of measuring the average.
  • 6:02 - 6:06
    So here it's the median.
  • 6:06 - 6:07
    And I what you might be thinking:
  • 6:07 - 6:09
    "well, that was easy enough when we have 5 numbers,
  • 6:09 - 6:14
    but what if we have 6 numbers, what if it was like this:
  • 6:14 - 6:16
    what if this was our set of numbers?
  • 6:16 - 6:20
    1,1,2,3,4...let's add another 4 there
  • 6:20 - 6:23
    So now there no midle number right?
  • 6:23 - 6:25
    I mean, 2 is not the middle number,
  • 6:25 - 6:27
    because ther is two numbers less than 2,
  • 6:27 - 6:29
    and there three greater than 2
  • 6:29 - 6:30
    And 3 is not the middle number,
  • 6:30 - 6:33
    because ther's two larger and three less than 3
  • 6:33 - 6:35
    So there's no middle number.
  • 6:35 - 6:35
    So when you have a set with even numbers,
  • 6:35 - 6:38
    and someone tells you to figure out the median,
  • 6:38 - 6:41
    what you do is you take the middle two numbers
  • 6:41 - 6:46
    then you take the arithmetic mean of those two numbers
  • 6:46 - 6:48
    So in this case, for this case,
  • 6:48 - 6:51
    the median would be 2.5
  • 6:51 - 6:52
    Fair enough
  • 6:52 - 6:54
    Let's put this aside,
  • 6:54 - 6:55
    because I want to compare
  • 6:55 - 6:56
    the median and the mean and the mode
  • 6:56 - 6:58
    for the same set of numbers
  • 6:58 - 6:59
    But that's a good thing to know,
  • 6:59 - 7:02
    because sometimes, it can be a little bit confusing.
  • 7:02 - 7:04
    And you know, these are all definitions,
  • 7:04 - 7:06
    this are all kind of mathematical tools
  • 7:06 - 7:08
    for getting you head around numbers,
  • 7:08 - 7:09
    there's nothing like:
  • 7:09 - 7:12
    oneday, someone saw one of these formulas
  • 7:12 - 7:13
    at the face of the sun, and say:
  • 7:13 - 7:15
    "oh, that's part of the universe,
  • 7:15 - 7:17
    this is how the average should be calculated"
  • 7:17 - 7:19
    These are human constructs
  • 7:19 - 7:22
    to get our heads around large sets of data
  • 7:22 - 7:24
    This isn't a large set of data,
  • 7:24 - 7:25
    we have five numbers,
  • 7:25 - 7:27
    but if we have five billion numbers,
  • 7:27 - 7:28
    you can imagine that you don't like
  • 7:28 - 7:30
    to think about every number individually.
  • 7:30 - 7:32
    But anyway, before I talk more about that,
  • 7:32 - 7:34
    let me tell you what the mode is.
  • 7:34 - 7:36
    And the mode, to some degree,
  • 7:36 - 7:38
    is the one that most people probably forget
  • 7:38 - 7:40
    or never learn, and when then see it on an exam
  • 7:40 - 7:43
    it confuses them, because "oh, that's sounds very advanced"
  • 7:43 - 7:46
    But in some way it is the easiest
  • 7:46 - 7:48
    of all of the measures of central tendancy
  • 7:48 - 7:51
    The mode is essentially:
  • 7:51 - 7:54
    "what number is most common in the set"
  • 7:54 - 7:56
    So, in this example, there is two 1,
  • 7:56 - 7:58
    and there is one of every thing else, right?
  • 7:58 - 8:01
    So the mode here is: 1
  • 8:01 - 8:03
    So mode, you kind of say is
  • 8:03 - 8:04
    "the most common number".
  • 8:04 - 8:05
    And then you can say:
  • 8:05 - 8:10
    "hey, what if our set was: 1,1,2,3,4,4
  • 8:10 - 8:12
    So here I have two 1, and I have two 4.
  • 8:12 - 8:14
    And this is were the mode gets a little bit tricky,
  • 8:14 - 8:18
    because either of these would have been a decent answer
  • 8:18 - 8:18
    for the mode.
  • 8:18 - 8:20
    You could have actually say that
  • 8:20 - 8:21
    "the mode of this is 1"
  • 8:21 - 8:22
    or "the mode of this is 4".
  • 8:22 - 8:24
    And it gets a little bit ambiguous,
  • 8:24 - 8:25
    and you probably want a little of clarity
  • 8:25 - 8:26
    from the person asking you.
  • 8:26 - 8:28
    Most times on a test, when they ask you,
  • 8:28 - 8:30
    there's not gonna be this situation.
  • 8:30 - 8:34
    There will be a most common number in the set.
  • 8:34 - 8:37
    So then I say: "Why wasn't one of these good enough?"
  • 8:37 - 8:39
    Why we do we use averages?
  • 8:39 - 8:42
    Why don't we use arithmetic mean all the time?
  • 8:42 - 8:46
    What are the median and mode good for?"
  • 8:46 - 8:48
    Well I'll try to do a little one example of that,
  • 8:48 - 8:51
    and see if it "rings true with you".
  • 8:51 - 8:53
    And then you can think a little bit more.
  • 8:53 - 8:54
    Let's have this set of numbers:
  • 8:54 - 9:05
    3,3,3,3,3, and, I don't know... and 100!
  • 9:05 - 9:08
    Sa what's the arithmetic mean here?
  • 9:08 - 9:14
    So 1,2,3,4,5 threes and 1 one hundred.
  • 9:14 - 9:17
    So it would be 115/6, right?
  • 9:17 - 9:20
    Because there's 1,2,3,4,5,6 numbers.
  • 9:20 - 9:23
    And 115 is just the sum of all of these.
  • 9:23 - 9:39
    So that's equal to:...19 and 1/6
  • 9:39 - 9:40
    Fair enough, I just added the numbers,
  • 9:40 - 9:42
    and divided by how many they are,
  • 9:42 - 9:46
    But my question is: "is this really a representative of this set?"
  • 9:46 - 9:47
    I mean, I have a ton of threes,
  • 9:47 - 9:49
    then I have one 100 all of a sudden,
  • 9:49 - 9:50
    and we are saying that the central tendency
  • 9:50 - 9:52
    is 19 and 1/6.
  • 9:52 - 9:55
    Does 19 and 1/6 really seem an indicative of this set?
  • 9:55 - 9:56
    Maybe is does, depending on the application,
  • 9:56 - 9:58
    but it just seems a little bit off.
  • 9:58 - 10:00
    My intuition would be that the central tendency
  • 10:00 - 10:02
    is something closer to 3.
  • 10:02 - 10:03
    Because their is a lot of threes here.
  • 10:03 - 10:08
    So what will the median tell us?
  • 10:08 - 10:11
    I already put these numbers in order, right?
  • 10:11 - 10:12
    If I gave to you in another order,
  • 10:12 - 10:13
    you would put them in this order.
  • 10:13 - 10:15
    And you would say: "what's the middle number?"
  • 10:15 - 10:16
    Let's see: the middle two numbers,
  • 10:16 - 10:19
    since they have an even number, are 3 and 3.
  • 10:19 - 10:25
    So if I take the arithmetic mean of 3 and 3,
  • 10:25 - 10:28
    I get 3.
  • 10:28 - 10:30
    And this is maybe a better measurement
  • 10:30 - 10:32
    of the central tendency, or the average
  • 10:32 - 10:35
    of this set of numbers, right?
  • 10:35 - 10:37
    Essentially what it does is,
  • 10:37 - 10:38
    by taking the median,
  • 10:38 - 10:41
    I wasn't affected by this very large number
  • 10:41 - 10:42
    that's very different from the others.
  • 10:42 - 10:45
    The statistics call that an outlier.
  • 10:45 - 10:46
    A number, that, you know,
  • 10:46 - 10:48
    if talked about average home prices,
  • 10:48 - 10:50
    maybe every house in the city
  • 10:50 - 10:52
    is 100,000$, and then
  • 10:52 - 10:55
    there's one house that cost a trillion $.
  • 10:55 - 10:56
    And then someone tells you:
  • 10:56 - 10:58
    "the average price is.. I don't know.. 1 million $"
  • 10:58 - 10:59
    then you might have a very wrong
  • 10:59 - 11:01
    perception of that city.
  • 11:01 - 11:04
    But the median house price, would be 100,000$
  • 11:04 - 11:06
    and you would get a better sense
  • 11:06 - 11:07
    of what houses in that city are like.
  • 11:07 - 11:09
    So, similarly, this median maybe
  • 11:09 - 11:10
    gives you a better sense of what
  • 11:10 - 11:12
    the numbers in this set are like.
  • 11:12 - 11:15
    Because the arithmetic mean
  • 11:15 - 11:18
    was influenced by this outlier.
  • 11:18 - 11:20
    And be able to tell what an outlier is,
  • 11:20 - 11:21
    is kind of those things that
  • 11:21 - 11:23
    statisticians will say:
  • 11:23 - 11:24
    "well I know it when I see it"
  • 11:24 - 11:25
    There isn't really a formal definition for it,
  • 11:25 - 11:27
    but it tends to be
  • 11:27 - 11:28
    a number that really kinds of sticks out.
  • 11:28 - 11:31
    Sometimes it's, you know, a measurement error or whatever.
  • 11:31 - 11:34
    And then finally the mode.
  • 11:34 - 11:36
    What is the most common number in the set?
  • 11:36 - 11:39
    Well there is five 3 and one 100,
  • 11:39 - 11:41
    so the most common number is...
  • 11:41 - 11:43
    3
  • 11:43 - 11:45
    So in this case, when we have this outlier,
  • 11:45 - 11:46
    the median and the mode tend to be
  • 11:46 - 11:49
    maybe a little bit better about giving you an indication
  • 11:49 - 11:52
    of what this numbers represent.
  • 11:52 - 11:54
    Maybe this was just a measurement error.
  • 11:54 - 11:54
    But I dont know.
  • 11:54 - 11:56
    We don't actually know what these numbers represent
  • 11:56 - 11:58
    if these where house pricies,
  • 11:58 - 11:59
    I would argue that these are more indicative
  • 11:59 - 12:03
    measures of what are the prices in the area.
  • 12:03 - 12:05
    But if this something else,
  • 12:05 - 12:06
    if these are scores on a test,
  • 12:06 - 12:09
    maybe there is one kid in the class, out of 6 kids,
  • 12:09 - 12:10
    who did really really well
  • 12:10 - 12:11
    and everyone else didn't study.
  • 12:11 - 12:13
    And this is more indicative of how
  • 12:13 - 12:15
    students at that level are doing on average.
  • 12:15 - 12:18
    Anyway, I'm done talking about all of this.
  • 12:18 - 12:21
    And I encourage to play with a lot of numbers,
  • 12:21 - 12:22
    and deal with the concepts yourself.
  • 12:22 - 12:24
    In the next video, we'll explore
  • 12:24 - 12:26
    more descriptive statistics,
  • 12:26 - 12:28
    and instead of talking about the central tendency,
  • 12:28 - 12:29
    we'll talk about how spread apart
  • 12:29 - 12:31
    things are away from the central tendency.
  • 12:31 - 5999:59
    See you in the next video!
Title:
Statistics: The Average
Description:

Introduction to descriptive statistics and central tendency. Ways to measure the average of a set: median, mean, mode

more » « less
Video Language:
English
Duration:
12:35
paulm edited French subtitles for Statistics: The Average
paulm added a translation

French subtitles

Incomplete

Revisions