Return to Video

Statistics: The Average

  • 0:01 - 0:03
    Welcome to the playlist
    on statistics.
  • 0:03 - 0:06
    Something I've been meaning
    to do for some time.
  • 0:06 - 0:09
    So anyway, I just want to get
    right into the meat of it and
  • 0:09 - 0:12
    I'll try to do as many examples
    as possible and hopefully
  • 0:12 - 0:15
    give you the feel for what
    statistics is all about.
  • 0:15 - 0:17
    And, really, just to kind of
    start off in case you're not
  • 0:17 - 0:19
    familiar with it -- although, I
    think a lot of people have an
  • 0:19 - 0:21
    intuitive feel for what
    statistics is about.
  • 0:22 - 0:27
    And essentially -- well in very
    general terms it's kind of
  • 0:27 - 0:29
    getting your head around data.
  • 0:29 - 0:31
    And it can broadly
    be classified.
  • 0:31 - 0:33
    Well there are maybe
    three categories.
  • 0:33 - 0:35
    You have descriptive.
  • 0:35 - 0:39
    So say you have a lot of data
    and you wanted to tell someone
  • 0:39 - 0:41
    about it without giving
    them all of the data.
  • 0:41 - 0:45
    Maybe you can kind of find
    indicative numbers that
  • 0:45 - 0:48
    somehow represent all of
    that data without having to
  • 0:48 - 0:49
    go over all of the data.
  • 0:49 - 0:50
    That would be
    descriptive statistics.
  • 0:50 - 0:52
    There's also predictive.
  • 0:52 - 0:53
    Well, I'll kind of
    group them together.
  • 0:53 - 0:55
    There's inferential statistics.
  • 0:58 - 1:01
    And this is when you use
    data to essentially make
  • 1:01 - 1:02
    conclusions about things.
  • 1:02 - 1:06
    So let's say you've sampled
    some data from a population --
  • 1:06 - 1:09
    and we'll talk a lot about
    samples versus populations but
  • 1:09 - 1:11
    I think you have just a basic
    sense of what that is, right?
  • 1:11 - 1:14
    If I survey three people
    who are going to vote for
  • 1:14 - 1:16
    president, I clearly haven't
    surveyed the entire population.
  • 1:16 - 1:18
    I've surveyed a sample.
  • 1:18 - 1:22
    But what inferential statistics
    are all about are if we can do
  • 1:22 - 1:25
    some math on the samples, maybe
    we can make inferences or
  • 1:25 - 1:28
    conclusions about the
    population as a whole.
  • 1:28 - 1:30
    Well, anyway, that's just
    a big picture of what
  • 1:30 - 1:31
    statistics is all about.
  • 1:31 - 1:34
    Let's just get into the
    meat of it and we'll start
  • 1:34 - 1:35
    with the descriptive.
  • 1:38 - 1:41
    So the first thing that, I
    don't know, that I would want
  • 1:41 - 1:44
    to do or I think most people
    would want to do when they are
  • 1:44 - 1:47
    given a whole set of numbers in
    they're told to describe it.
  • 1:47 - 1:51
    Well, maybe I can come up with
    some number that is most
  • 1:51 - 1:54
    indicative of all of the
    numbers in that set.
  • 1:54 - 1:57
    Or some number that represents,
    kind of, the central tendency
  • 1:57 - 2:00
    -- this is a word you'll see
    a lot in statistics books.
  • 2:00 - 2:03
    The central tendency
    of a set of numbers.
  • 2:07 - 2:09
    And this is also
    called the average.
  • 2:09 - 2:12
    And I'll be a little bit more
    exact here than I normally am
  • 2:12 - 2:16
    with the word "average." When I
    talk about it in this context,
  • 2:16 - 2:20
    it just means that the average
    is a number that somehow
  • 2:20 - 2:23
    is giving us a sense of
    the central tendency.
  • 2:23 - 2:25
    Or maybe a number that is most
    representative of a set.
  • 2:25 - 2:27
    And I know that sounds all
    very abstract but let's
  • 2:27 - 2:29
    do a couple of examples.
  • 2:29 - 2:32
    So there's a bunch of ways
    that you can actually measure
  • 2:32 - 2:35
    the central tendency or the
    average of a set of numbers.
  • 2:35 - 2:38
    And you've probably
    seen these before.
  • 2:38 - 2:41
    They are the mean.
  • 2:41 - 2:43
    And actually, there's types
    of means but we'll stick
  • 2:43 - 2:44
    with the arithmetic mean.
  • 2:51 - 2:54
    geometric means and maybe we'll
    cover the harmonic
  • 2:54 - 2:55
    mean one day.
  • 2:55 - 3:03
    There's a mean, the
    median, and the mode.
  • 3:03 - 3:07
    And in statistics speak,
    these all can kind of be
  • 3:07 - 3:11
    representative of a data sets
    or population central tendency
  • 3:11 - 3:13
    or a sample central tendency.
  • 3:13 - 3:16
    And they all are collectively
    -- they can all be
  • 3:16 - 3:17
    forms of an average.
  • 3:17 - 3:19
    And I think when we see
    examples, it'll make a
  • 3:19 - 3:19
    little bit more sense.
  • 3:19 - 3:23
    In every day speak, when people
    talk about an average, I think
  • 3:23 - 3:26
    you've already computed
    averages in your life, they're
  • 3:26 - 3:29
    usually talking about
    the arithmetic mean.
  • 3:29 - 3:30
    So normally when someone says,
    "Let's take the average of
  • 3:30 - 3:33
    these numbers." And they expect
    you to do something, they want
  • 3:33 - 3:34
    you to figure out the
    arithmetic mean.
  • 3:34 - 3:36
    They don't want you figure
    out the median or the mode.
  • 3:36 - 3:39
    But before we go any further,
    let's figure out what
  • 3:39 - 3:41
    these things are.
  • 3:41 - 3:43
    Let me make up a
    set of numbers.
  • 3:43 - 3:46
    Let's say I have the number 1.
  • 3:46 - 3:50
    Let's say I have
    another 1, a 2, a 3.
  • 3:50 - 3:53
    Let's say I have a 4.
  • 3:53 - 3:55
    That's good enough.
  • 3:56 - 3:58
    We just want a simple example.
  • 3:58 - 4:03
    So the mean or the arithmetic
    mean is probably what you're
  • 4:03 - 4:06
    most familiar with when
    people talk about average.
  • 4:06 - 4:08
    And that's essentially -- you
    add up all the numbers and you
  • 4:08 - 4:09
    divide by the numbers
    that there are.
  • 4:09 - 4:16
    So in this case, it would be 1
    plus 1 plus 2 plus 3 plus 4.
  • 4:16 - 4:19
    And you're going to divide
    by one, two, three,
  • 4:19 - 4:21
    four, five numbers.
  • 4:21 - 4:22
    It's what?
  • 4:22 - 4:23
    1 plus 1 is 2.
  • 4:23 - 4:26
    2 plus 2 is 4.
  • 4:26 - 4:28
    4 plus 3 is 7.
  • 4:28 - 4:30
    7 plus 4 is 11.
  • 4:30 - 4:33
    So this is equal to 11/5.
  • 4:33 - 4:33
    That's what?
  • 4:33 - 4:34
    That's 2 1/5?
  • 4:34 - 4:38
    So that's equal to 2.2.
  • 4:38 - 4:40
    And so someone could
    say, "Hey, you know.
  • 4:40 - 4:41
    That is a pretty
    good representative
  • 4:41 - 4:42
    number of this set.
  • 4:42 - 4:45
    That's the number that all of
    these numbers you can kind of
  • 4:45 - 4:47
    say are closest to." Or, 2.2
    represents the central
  • 4:47 - 4:49
    tendency of this set.
  • 4:49 - 4:51
    And in common speak, that
    would be the average.
  • 4:51 - 4:53
    But if we're being a little
    bit more particular, this
  • 4:53 - 4:55
    is the arithmetic mean
    of this set of numbers.
  • 4:55 - 4:57
    And you see it kind
    of represents them.
  • 4:57 - 4:59
    If I didn't want to give you
    the list of five numbers, I
  • 4:59 - 5:01
    could say, "Well, you know, I
    have a set of five numbers and
  • 5:01 - 5:04
    their mean is 2.2." It kind of
    tells you a little bit of at
  • 5:04 - 5:06
    least, you know, where
    the numbers are.
  • 5:06 - 5:09
    We'll talk a little bit more
    about how do you know how far
  • 5:09 - 5:12
    the numbers are from that mean
    in probably the next video.
  • 5:12 - 5:14
    So that's one measure.
  • 5:14 - 5:17
    Another measure, instead of
    averaging it in this way, you
  • 5:17 - 5:20
    can average it by putting the
    numbers in order, which
  • 5:20 - 5:20
    I actually already did.
  • 5:20 - 5:23
    So let's just write them
    down in order again.
  • 5:23 - 5:27
    1, 1, 2, 3, 4.
  • 5:27 - 5:28
    And you just take
    the middle number.
  • 5:28 - 5:32
    So let's see, there's one, two,
    three, four, five numbers.
  • 5:32 - 5:34
    So the middle number's going
    to be right here, right?
  • 5:34 - 5:35
    The middle number is 2.
  • 5:35 - 5:37
    There's two numbers greater
    than 2 and there's two
  • 5:37 - 5:39
    numbers less than 2.
  • 5:39 - 5:40
    And this is called the median.
  • 5:40 - 5:42
    So it's actually very
    little computation.
  • 5:42 - 5:43
    You just have to essentially
    sort the numbers.
  • 5:43 - 5:46
    And then you find whatever
    number where you have an
  • 5:46 - 5:48
    equal number greater than
    or less than that number.
  • 5:48 - 5:51
    So the median of this set is 2.
  • 5:51 - 5:53
    And you see, I mean,
    that's actually fairly
  • 5:53 - 5:54
    close to the mean.
  • 5:54 - 5:56
    And there's no right answer.
  • 5:56 - 5:59
    One of these isn't a better
    answer for the average.
  • 5:59 - 6:02
    They're just different ways
    of measuring the average.
  • 6:02 - 6:05
    So here it's the median.
  • 6:05 - 6:07
    And I know what you might be
    thinking. "Well, that was
  • 6:07 - 6:09
    easy enough when we
    had five numbers.
  • 6:09 - 6:12
    What if we had six numbers?"
    What if it was like this?
  • 6:12 - 6:14
    What if this was our
    set of numbers?
  • 6:14 - 6:20
    1, 1, 2, 3, let's add
    another 4 there.
  • 6:20 - 6:22
    So now, there's no
    middle number, right?
  • 6:22 - 6:25
    I mean 2 is not the middle
    number because there's two less
  • 6:25 - 6:27
    than and three larger than it.
  • 6:27 - 6:29
    And then 3's not the middle
    number because there's three
  • 6:29 - 6:32
    larger and -- sorry, there's
    two larger and three
  • 6:32 - 6:33
    smaller than it.
  • 6:33 - 6:34
    So there's no middle number.
  • 6:34 - 6:36
    So when you have a set with
    even numbers and someone tells
  • 6:36 - 6:38
    you to figure out the median,
    what you do is you take the
  • 6:38 - 6:44
    middle two numbers and then you
    take the arithmetic mean
  • 6:44 - 6:45
    of those two numbers.
  • 6:45 - 6:51
    So in this case of this set,
    the median would be 2.5.
  • 6:51 - 6:52
    Fair enough.
  • 6:52 - 6:54
    But let's put this aside
    because I want to compare the
  • 6:54 - 6:57
    median and the means and the
    modes for the same
  • 6:57 - 6:58
    set of numbers.
  • 6:58 - 7:00
    But that's a good thing to
    know because sometimes it
  • 7:00 - 7:01
    can be a little confusing.
  • 7:01 - 7:04
    And these are all definitions.
  • 7:04 - 7:06
    These are all kind of
    mathematical tools for getting
  • 7:06 - 7:08
    our heads around numbers.
  • 7:08 - 7:12
    It's not like one day someone
    saw one of these formulas on
  • 7:12 - 7:14
    the face of the sun and says,
    "Oh, that's part of the
  • 7:14 - 7:17
    universe that this is how the
    average should be calculated."
  • 7:17 - 7:20
    These are human constructs to
    kind of just get our heads
  • 7:20 - 7:22
    around large sets of data.
  • 7:22 - 7:25
    This isn't a large set of data,
    but instead of five numbers, if
  • 7:25 - 7:27
    we had five million numbers,
    you can imagine if you don't
  • 7:27 - 7:29
    like thinking about every
    number individually.
  • 7:29 - 7:32
    Anyway, before I talk more
    about that, let me tell
  • 7:32 - 7:33
    you what the mode is.
  • 7:33 - 7:36
    And the mode to some degree,
    it's the one I think most
  • 7:36 - 7:40
    people probably forget or never
    learn and when they see it on
  • 7:40 - 7:42
    an exam, it confuses them
    because they're like, "Oh, that
  • 7:42 - 7:45
    sounds very advanced." But in
    some ways, it is the easiest of
  • 7:45 - 7:49
    all of the measures of central
    tendency or of average.
  • 7:49 - 7:54
    The mode is essentially what
    number is most common in a set.
  • 7:54 - 7:56
    So in this example, there's
    two 1's and then there's one
  • 7:56 - 7:58
    of everything else, right?
  • 7:58 - 8:00
    So the mode here is 1.
  • 8:00 - 8:03
    So mode is the most
    common number.
  • 8:03 - 8:05
    And then you could kind of
    say, "Whoa, hey Sal, what
  • 8:05 - 8:06
    if this was our set?
  • 8:06 - 8:12
    1, 1, 2, 3, 4, 4." Here I have
    two 1's and I have two 4's.
  • 8:12 - 8:14
    And this is where the mode gets
    a little bit tricky because
  • 8:14 - 8:18
    either of these would have been
    a decent answer for the mode.
  • 8:18 - 8:20
    You could have actually said
    the mode of this is 1 or the
  • 8:20 - 8:23
    mode of this is 4 and it gets
    a little bit ambiguous.
  • 8:23 - 8:25
    And you probably want
    a little clarity from
  • 8:25 - 8:26
    the person asking you.
  • 8:26 - 8:29
    Most times on a test when they
    ask you, there's not going
  • 8:29 - 8:29
    to be this ambiguity.
  • 8:29 - 8:33
    There will be a most
    common number in the set.
  • 8:33 - 8:36
    So now it's like oh, well you
    know, why wasn't just one
  • 8:36 - 8:37
    of these good enough?
  • 8:37 - 8:38
    You know why we learned
    averages, why don't
  • 8:38 - 8:40
    we just use averages?
  • 8:40 - 8:43
    Or why don't we use arithmetic
    mean all the time?
  • 8:43 - 8:45
    What's median and
    mode good for?
  • 8:45 - 8:48
    Well, I'll try to do one
    example of that and see if
  • 8:48 - 8:51
    it rings true with you.
  • 8:51 - 8:52
    And then you can think
    a little bit more.
  • 8:52 - 8:54
    Let's say I had this
    set of numbers.
  • 8:54 - 9:04
    3, 3, 3, 3, 3, and,
    I don't know, 100.
  • 9:04 - 9:09
    So what's the
    arithmetic mean here?
  • 9:09 - 9:12
    I have one, two, three,
    four, five 3's and 100.
  • 9:12 - 9:17
    So it would be 115
    divided by 6, right?
  • 9:17 - 9:20
    I could have one, two, three,
    four, five, six numbers.
  • 9:20 - 9:22
    115 is just the sum
    of all of these.
  • 9:22 - 9:27
    So that's equal to -- how many
    times does 6 go into 115?
  • 9:27 - 9:29
    6 goes into it one time.
  • 9:29 - 9:31
    1 times 6 is 6.
  • 9:31 - 9:32
    55 goes into it 9 times.
  • 9:32 - 9:34
    9 times 6 is 54.
  • 9:34 - 9:36
    So it's equal to 19 1/6.
  • 9:37 - 9:38
    Fair enough.
  • 9:39 - 9:41
    I just added all the
    numbers and divided by
  • 9:41 - 9:42
    how many there are.
  • 9:42 - 9:45
    But my question is, is this
    really representative
  • 9:45 - 9:46
    of this set?
  • 9:46 - 9:48
    I mean, I have a ton of 3's
    and then I have 100 all of a
  • 9:48 - 9:51
    sudden, and we're saying that
    the central tendency is 19 1/6.
  • 9:51 - 9:54
    And, I mean, 19 1/6 doesn't
    really seem indicative
  • 9:54 - 9:54
    of the set.
  • 9:54 - 9:56
    I mean maybe it does, depending
    on your application, but it
  • 9:56 - 9:58
    just seems a little
    bit off, right?
  • 9:58 - 10:00
    I mean, my intuition would be
    that the central tendency is
  • 10:00 - 10:03
    something closer to 3 because
    there's a lot of 3's here.
  • 10:03 - 10:07
    So what would the
    median tell us?
  • 10:07 - 10:10
    I already put these
    numbers in order, right?
  • 10:10 - 10:11
    If I give it to you out of
    order, you'd want to put it
  • 10:11 - 10:13
    in this order and you'd say
    what's the middle number?
  • 10:13 - 10:16
    Let's see, the middle two
    numbers, since I have an
  • 10:16 - 10:18
    even number, are 3 and 3.
  • 10:18 - 10:21
    So if I take the average of
    3 and 3 -- or I should be
  • 10:21 - 10:22
    particular with my language.
  • 10:22 - 10:27
    If I take the arithmetic
    mean of 3 and 3, I get 3.
  • 10:27 - 10:30
    And this is maybe a better
    measurement of the central
  • 10:30 - 10:34
    tendency or of the average of
    this set of numbers, right?
  • 10:34 - 10:38
    Essentially, what it does is by
    taking the median, I wasn't so
  • 10:38 - 10:41
    much affected by this really
    large number that's very
  • 10:41 - 10:42
    different than the others.
  • 10:42 - 10:44
    In statistics they
    call that an outlier.
  • 10:44 - 10:47
    A number that, you know, if you
    talked about average home
  • 10:47 - 10:52
    prices, maybe every house in
    the city is $100,000 and then
  • 10:52 - 10:54
    there's one house that
    costs $1 trillion.
  • 10:54 - 10:56
    And then if someone told you
    the average house price was, I
  • 10:56 - 10:58
    don't know, $1 million, you
    might have a very wrong
  • 10:58 - 11:00
    perception of that city.
  • 11:00 - 11:04
    But the median house price
    would be $100,000 and you get
  • 11:04 - 11:06
    a better sense of what the
    houses in that city are like.
  • 11:06 - 11:09
    So similarly, this median,
    maybe, gives you a better
  • 11:09 - 11:12
    sense of what the numbers
    in this set are like.
  • 11:12 - 11:16
    Because the arithmetic mean
    was skewed by this, what
  • 11:16 - 11:18
    they call an outlier.
  • 11:18 - 11:20
    And being able to tell what
    an outlier is, it's kind of
  • 11:20 - 11:22
    one of those things that a
    statistician will say, well,
  • 11:22 - 11:23
    I know it when I see it.
  • 11:23 - 11:25
    There isn't really a formal
    definition for it but it tends
  • 11:25 - 11:28
    to be a number that really kind
    of sticks out and sometimes
  • 11:28 - 11:31
    it's due to, you know, a
    measurement error or whatever.
  • 11:31 - 11:33
    And then finally, the mode.
  • 11:33 - 11:35
    What is the most common
    number in this set?
  • 11:35 - 11:39
    Well there's five 3's
    and there's 100.
  • 11:39 - 11:41
    So the most common number
    is, once again, it's a 3.
  • 11:41 - 11:45
    So in this case, when you had
    this outlier, the median and
  • 11:45 - 11:47
    the mode tend to be, you know,
    maybe they're a little bit
  • 11:47 - 11:51
    better about giving you an
    indication of what these
  • 11:51 - 11:52
    numbers represent.
  • 11:52 - 11:53
    Maybe this was just a
    measurement error.
  • 11:53 - 11:54
    But I don't know, we
    don't actually know
  • 11:54 - 11:55
    what these represent.
  • 11:55 - 11:58
    If these are house prices, then
    I would argue that these are
  • 11:58 - 12:01
    probably more indicative
    measures of what the
  • 12:01 - 12:03
    houses in a area cost.
  • 12:03 - 12:06
    But if this is something else,
    if this is scores on a test,
  • 12:06 - 12:08
    maybe, you know, maybe there is
    one kid in the class -- one out
  • 12:08 - 12:10
    of six kids who did really,
    really well and everyone
  • 12:10 - 12:10
    else didn't study.
  • 12:10 - 12:14
    And this is more indicative
    of, kind of, how students at
  • 12:14 - 12:15
    that level do on average.
  • 12:15 - 12:18
    Anyway, I'm done talking
    about all of this.
  • 12:18 - 12:20
    And I encourage you to play
    with a lot of numbers and deal
  • 12:20 - 12:21
    with the concepts yourself.
  • 12:21 - 12:25
    In the next video, we'll
    explore more descriptive
  • 12:25 - 12:25
    statistics.
  • 12:25 - 12:28
    Instead of talking about the
    central tendency, we'll talk
  • 12:28 - 12:30
    about how spread apart things
    are away from the
  • 12:30 - 12:32
    central tendency.
  • 12:32 - 12:33
    See you in the next video.
Title:
Statistics: The Average
Description:

Introduction to descriptive statistics and central tendency. Ways to measure the average of a set: median, mean, mode

more » « less
Video Language:
English
Duration:
12:35

English subtitles

Revisions