-
Welcome to the playlist
on statistics.
-
Something I've been meaning
to do for some time.
-
So anyway, I just want to get
right into the meat of it and
-
I'll try to do as many examples
as possible and hopefully
-
give you the feel for what
statistics is all about.
-
And, really, just to kind of
start off in case you're not
-
familiar with it -- although, I
think a lot of people have an
-
intuitive feel for what
statistics is about.
-
And essentially -- well in very
general terms it's kind of
-
getting your head around data.
-
And it can broadly
be classified.
-
Well there are maybe
three categories.
-
You have descriptive.
-
So say you have a lot of data
and you wanted to tell someone
-
about it without giving
them all of the data.
-
Maybe you can kind of find
indicative numbers that
-
somehow represent all of
that data without having to
-
go over all of the data.
-
That would be
descriptive statistics.
-
There's also predictive.
-
Well, I'll kind of
group them together.
-
There's inferential statistics.
-
And this is when you use
data to essentially make
-
conclusions about things.
-
So let's say you've sampled
some data from a population --
-
and we'll talk a lot about
samples versus populations but
-
I think you have just a basic
sense of what that is, right?
-
If I survey three people
who are going to vote for
-
president, I clearly haven't
surveyed the entire population.
-
I've surveyed a sample.
-
But what inferential statistics
are all about are if we can do
-
some math on the samples, maybe
we can make inferences or
-
conclusions about the
population as a whole.
-
Well, anyway, that's just
a big picture of what
-
statistics is all about.
-
Let's just get into the
meat of it and we'll start
-
with the descriptive.
-
So the first thing that, I
don't know, that I would want
-
to do or I think most people
would want to do when they are
-
given a whole set of numbers in
they're told to describe it.
-
Well, maybe I can come up with
some number that is most
-
indicative of all of the
numbers in that set.
-
Or some number that represents,
kind of, the central tendency
-
-- this is a word you'll see
a lot in statistics books.
-
The central tendency
of a set of numbers.
-
And this is also
called the average.
-
And I'll be a little bit more
exact here than I normally am
-
with the word "average." When I
talk about it in this context,
-
it just means that the average
is a number that somehow
-
is giving us a sense of
the central tendency.
-
Or maybe a number that is most
representative of a set.
-
And I know that sounds all
very abstract but let's
-
do a couple of examples.
-
So there's a bunch of ways
that you can actually measure
-
the central tendency or the
average of a set of numbers.
-
And you've probably
seen these before.
-
They are the mean.
-
And actually, there's types
of means but we'll stick
-
with the arithmetic mean.
-
geometric means and maybe we'll
cover the harmonic
-
mean one day.
-
There's a mean, the
median, and the mode.
-
And in statistics speak,
these all can kind of be
-
representative of a data sets
or population central tendency
-
or a sample central tendency.
-
And they all are collectively
-- they can all be
-
forms of an average.
-
And I think when we see
examples, it'll make a
-
little bit more sense.
-
In every day speak, when people
talk about an average, I think
-
you've already computed
averages in your life, they're
-
usually talking about
the arithmetic mean.
-
So normally when someone says,
"Let's take the average of
-
these numbers." And they expect
you to do something, they want
-
you to figure out the
arithmetic mean.
-
They don't want you figure
out the median or the mode.
-
But before we go any further,
let's figure out what
-
these things are.
-
Let me make up a
set of numbers.
-
Let's say I have the number 1.
-
Let's say I have
another 1, a 2, a 3.
-
Let's say I have a 4.
-
That's good enough.
-
We just want a simple example.
-
So the mean or the arithmetic
mean is probably what you're
-
most familiar with when
people talk about average.
-
And that's essentially -- you
add up all the numbers and you
-
divide by the numbers
that there are.
-
So in this case, it would be 1
plus 1 plus 2 plus 3 plus 4.
-
And you're going to divide
by one, two, three,
-
four, five numbers.
-
It's what?
-
1 plus 1 is 2.
-
2 plus 2 is 4.
-
4 plus 3 is 7.
-
7 plus 4 is 11.
-
So this is equal to 11/5.
-
That's what?
-
That's 2 1/5?
-
So that's equal to 2.2.
-
And so someone could
say, "Hey, you know.
-
That is a pretty
good representative
-
number of this set.
-
That's the number that all of
these numbers you can kind of
-
say are closest to." Or, 2.2
represents the central
-
tendency of this set.
-
And in common speak, that
would be the average.
-
But if we're being a little
bit more particular, this
-
is the arithmetic mean
of this set of numbers.
-
And you see it kind
of represents them.
-
If I didn't want to give you
the list of five numbers, I
-
could say, "Well, you know, I
have a set of five numbers and
-
their mean is 2.2." It kind of
tells you a little bit of at
-
least, you know, where
the numbers are.
-
We'll talk a little bit more
about how do you know how far
-
the numbers are from that mean
in probably the next video.
-
So that's one measure.
-
Another measure, instead of
averaging it in this way, you
-
can average it by putting the
numbers in order, which
-
I actually already did.
-
So let's just write them
down in order again.
-
1, 1, 2, 3, 4.
-
And you just take
the middle number.
-
So let's see, there's one, two,
three, four, five numbers.
-
So the middle number's going
to be right here, right?
-
The middle number is 2.
-
There's two numbers greater
than 2 and there's two
-
numbers less than 2.
-
And this is called the median.
-
So it's actually very
little computation.
-
You just have to essentially
sort the numbers.
-
And then you find whatever
number where you have an
-
equal number greater than
or less than that number.
-
So the median of this set is 2.
-
And you see, I mean,
that's actually fairly
-
close to the mean.
-
And there's no right answer.
-
One of these isn't a better
answer for the average.
-
They're just different ways
of measuring the average.
-
So here it's the median.
-
And I know what you might be
thinking. "Well, that was
-
easy enough when we
had five numbers.
-
What if we had six numbers?"
What if it was like this?
-
What if this was our
set of numbers?
-
1, 1, 2, 3, let's add
another 4 there.
-
So now, there's no
middle number, right?
-
I mean 2 is not the middle
number because there's two less
-
than and three larger than it.
-
And then 3's not the middle
number because there's three
-
larger and -- sorry, there's
two larger and three
-
smaller than it.
-
So there's no middle number.
-
So when you have a set with
even numbers and someone tells
-
you to figure out the median,
what you do is you take the
-
middle two numbers and then you
take the arithmetic mean
-
of those two numbers.
-
So in this case of this set,
the median would be 2.5.
-
Fair enough.
-
But let's put this aside
because I want to compare the
-
median and the means and the
modes for the same
-
set of numbers.
-
But that's a good thing to
know because sometimes it
-
can be a little confusing.
-
And these are all definitions.
-
These are all kind of
mathematical tools for getting
-
our heads around numbers.
-
It's not like one day someone
saw one of these formulas on
-
the face of the sun and says,
"Oh, that's part of the
-
universe that this is how the
average should be calculated."
-
These are human constructs to
kind of just get our heads
-
around large sets of data.
-
This isn't a large set of data,
but instead of five numbers, if
-
we had five million numbers,
you can imagine if you don't
-
like thinking about every
number individually.
-
Anyway, before I talk more
about that, let me tell
-
you what the mode is.
-
And the mode to some degree,
it's the one I think most
-
people probably forget or never
learn and when they see it on
-
an exam, it confuses them
because they're like, "Oh, that
-
sounds very advanced." But in
some ways, it is the easiest of
-
all of the measures of central
tendency or of average.
-
The mode is essentially what
number is most common in a set.
-
So in this example, there's
two 1's and then there's one
-
of everything else, right?
-
So the mode here is 1.
-
So mode is the most
common number.
-
And then you could kind of
say, "Whoa, hey Sal, what
-
if this was our set?
-
1, 1, 2, 3, 4, 4." Here I have
two 1's and I have two 4's.
-
And this is where the mode gets
a little bit tricky because
-
either of these would have been
a decent answer for the mode.
-
You could have actually said
the mode of this is 1 or the
-
mode of this is 4 and it gets
a little bit ambiguous.
-
And you probably want
a little clarity from
-
the person asking you.
-
Most times on a test when they
ask you, there's not going
-
to be this ambiguity.
-
There will be a most
common number in the set.
-
So now it's like oh, well you
know, why wasn't just one
-
of these good enough?
-
You know why we learned
averages, why don't
-
we just use averages?
-
Or why don't we use arithmetic
mean all the time?
-
What's median and
mode good for?
-
Well, I'll try to do one
example of that and see if
-
it rings true with you.
-
And then you can think
a little bit more.
-
Let's say I had this
set of numbers.
-
3, 3, 3, 3, 3, and,
I don't know, 100.
-
So what's the
arithmetic mean here?
-
I have one, two, three,
four, five 3's and 100.
-
So it would be 115
divided by 6, right?
-
I could have one, two, three,
four, five, six numbers.
-
115 is just the sum
of all of these.
-
So that's equal to -- how many
times does 6 go into 115?
-
6 goes into it one time.
-
1 times 6 is 6.
-
55 goes into it 9 times.
-
9 times 6 is 54.
-
So it's equal to 19 1/6.
-
Fair enough.
-
I just added all the
numbers and divided by
-
how many there are.
-
But my question is, is this
really representative
-
of this set?
-
I mean, I have a ton of 3's
and then I have 100 all of a
-
sudden, and we're saying that
the central tendency is 19 1/6.
-
And, I mean, 19 1/6 doesn't
really seem indicative
-
of the set.
-
I mean maybe it does, depending
on your application, but it
-
just seems a little
bit off, right?
-
I mean, my intuition would be
that the central tendency is
-
something closer to 3 because
there's a lot of 3's here.
-
So what would the
median tell us?
-
I already put these
numbers in order, right?
-
If I give it to you out of
order, you'd want to put it
-
in this order and you'd say
what's the middle number?
-
Let's see, the middle two
numbers, since I have an
-
even number, are 3 and 3.
-
So if I take the average of
3 and 3 -- or I should be
-
particular with my language.
-
If I take the arithmetic
mean of 3 and 3, I get 3.
-
And this is maybe a better
measurement of the central
-
tendency or of the average of
this set of numbers, right?
-
Essentially, what it does is by
taking the median, I wasn't so
-
much affected by this really
large number that's very
-
different than the others.
-
In statistics they
call that an outlier.
-
A number that, you know, if you
talked about average home
-
prices, maybe every house in
the city is $100,000 and then
-
there's one house that
costs $1 trillion.
-
And then if someone told you
the average house price was, I
-
don't know, $1 million, you
might have a very wrong
-
perception of that city.
-
But the median house price
would be $100,000 and you get
-
a better sense of what the
houses in that city are like.
-
So similarly, this median,
maybe, gives you a better
-
sense of what the numbers
in this set are like.
-
Because the arithmetic mean
was skewed by this, what
-
they call an outlier.
-
And being able to tell what
an outlier is, it's kind of
-
one of those things that a
statistician will say, well,
-
I know it when I see it.
-
There isn't really a formal
definition for it but it tends
-
to be a number that really kind
of sticks out and sometimes
-
it's due to, you know, a
measurement error or whatever.
-
And then finally, the mode.
-
What is the most common
number in this set?
-
Well there's five 3's
and there's 100.
-
So the most common number
is, once again, it's a 3.
-
So in this case, when you had
this outlier, the median and
-
the mode tend to be, you know,
maybe they're a little bit
-
better about giving you an
indication of what these
-
numbers represent.
-
Maybe this was just a
measurement error.
-
But I don't know, we
don't actually know
-
what these represent.
-
If these are house prices, then
I would argue that these are
-
probably more indicative
measures of what the
-
houses in a area cost.
-
But if this is something else,
if this is scores on a test,
-
maybe, you know, maybe there is
one kid in the class -- one out
-
of six kids who did really,
really well and everyone
-
else didn't study.
-
And this is more indicative
of, kind of, how students at
-
that level do on average.
-
Anyway, I'm done talking
about all of this.
-
And I encourage you to play
with a lot of numbers and deal
-
with the concepts yourself.
-
In the next video, we'll
explore more descriptive
-
statistics.
-
Instead of talking about the
central tendency, we'll talk
-
about how spread apart things
are away from the
-
central tendency.
-
See you in the next video.