Welcome to the playlist
on statistics.
Something I've been meaning
to do for some time.
So anyway, I just want to get
right into the meat of it and
I'll try to do as many examples
as possible and hopefully
give you the feel for what
statistics is all about.
And, really, just to kind of
start off in case you're not
familiar with it -- although, I
think a lot of people have an
intuitive feel for what
statistics is about.
And essentially -- well in very
general terms it's kind of
getting your head around data.
And it can broadly
be classified.
Well there are maybe
three categories.
You have descriptive.
So say you have a lot of data
and you wanted to tell someone
about it without giving
them all of the data.
Maybe you can kind of find
indicative numbers that
somehow represent all of
that data without having to
go over all of the data.
That would be
descriptive statistics.
There's also predictive.
Well, I'll kind of
group them together.
There's inferential statistics.
And this is when you use
data to essentially make
conclusions about things.
So let's say you've sampled
some data from a population --
and we'll talk a lot about
samples versus populations but
I think you have just a basic
sense of what that is, right?
If I survey three people
who are going to vote for
president, I clearly haven't
surveyed the entire population.
I've surveyed a sample.
But what inferential statistics
are all about are if we can do
some math on the samples, maybe
we can make inferences or
conclusions about the
population as a whole.
Well, anyway, that's just
a big picture of what
statistics is all about.
Let's just get into the
meat of it and we'll start
with the descriptive.
So the first thing that, I
don't know, that I would want
to do or I think most people
would want to do when they are
given a whole set of numbers in
they're told to describe it.
Well, maybe I can come up with
some number that is most
indicative of all of the
numbers in that set.
Or some number that represents,
kind of, the central tendency
-- this is a word you'll see
a lot in statistics books.
The central tendency
of a set of numbers.
And this is also
called the average.
And I'll be a little bit more
exact here than I normally am
with the word "average." When I
talk about it in this context,
it just means that the average
is a number that somehow
is giving us a sense of
the central tendency.
Or maybe a number that is most
representative of a set.
And I know that sounds all
very abstract but let's
do a couple of examples.
So there's a bunch of ways
that you can actually measure
the central tendency or the
average of a set of numbers.
And you've probably
seen these before.
They are the mean.
And actually, there's types
of means but we'll stick
with the arithmetic mean.
geometric means and maybe we'll
cover the harmonic
mean one day.
There's a mean, the
median, and the mode.
And in statistics speak,
these all can kind of be
representative of a data sets
or population central tendency
or a sample central tendency.
And they all are collectively
-- they can all be
forms of an average.
And I think when we see
examples, it'll make a
little bit more sense.
In every day speak, when people
talk about an average, I think
you've already computed
averages in your life, they're
usually talking about
the arithmetic mean.
So normally when someone says,
"Let's take the average of
these numbers." And they expect
you to do something, they want
you to figure out the
arithmetic mean.
They don't want you figure
out the median or the mode.
But before we go any further,
let's figure out what
these things are.
Let me make up a
set of numbers.
Let's say I have the number 1.
Let's say I have
another 1, a 2, a 3.
Let's say I have a 4.
That's good enough.
We just want a simple example.
So the mean or the arithmetic
mean is probably what you're
most familiar with when
people talk about average.
And that's essentially -- you
add up all the numbers and you
divide by the numbers
that there are.
So in this case, it would be 1
plus 1 plus 2 plus 3 plus 4.
And you're going to divide
by one, two, three,
four, five numbers.
It's what?
1 plus 1 is 2.
2 plus 2 is 4.
4 plus 3 is 7.
7 plus 4 is 11.
So this is equal to 11/5.
That's what?
That's 2 1/5?
So that's equal to 2.2.
And so someone could
say, "Hey, you know.
That is a pretty
good representative
number of this set.
That's the number that all of
these numbers you can kind of
say are closest to." Or, 2.2
represents the central
tendency of this set.
And in common speak, that
would be the average.
But if we're being a little
bit more particular, this
is the arithmetic mean
of this set of numbers.
And you see it kind
of represents them.
If I didn't want to give you
the list of five numbers, I
could say, "Well, you know, I
have a set of five numbers and
their mean is 2.2." It kind of
tells you a little bit of at
least, you know, where
the numbers are.
We'll talk a little bit more
about how do you know how far
the numbers are from that mean
in probably the next video.
So that's one measure.
Another measure, instead of
averaging it in this way, you
can average it by putting the
numbers in order, which
I actually already did.
So let's just write them
down in order again.
1, 1, 2, 3, 4.
And you just take
the middle number.
So let's see, there's one, two,
three, four, five numbers.
So the middle number's going
to be right here, right?
The middle number is 2.
There's two numbers greater
than 2 and there's two
numbers less than 2.
And this is called the median.
So it's actually very
little computation.
You just have to essentially
sort the numbers.
And then you find whatever
number where you have an
equal number greater than
or less than that number.
So the median of this set is 2.
And you see, I mean,
that's actually fairly
close to the mean.
And there's no right answer.
One of these isn't a better
answer for the average.
They're just different ways
of measuring the average.
So here it's the median.
And I know what you might be
thinking. "Well, that was
easy enough when we
had five numbers.
What if we had six numbers?"
What if it was like this?
What if this was our
set of numbers?
1, 1, 2, 3, let's add
another 4 there.
So now, there's no
middle number, right?
I mean 2 is not the middle
number because there's two less
than and three larger than it.
And then 3's not the middle
number because there's three
larger and -- sorry, there's
two larger and three
smaller than it.
So there's no middle number.
So when you have a set with
even numbers and someone tells
you to figure out the median,
what you do is you take the
middle two numbers and then you
take the arithmetic mean
of those two numbers.
So in this case of this set,
the median would be 2.5.
Fair enough.
But let's put this aside
because I want to compare the
median and the means and the
modes for the same
set of numbers.
But that's a good thing to
know because sometimes it
can be a little confusing.
And these are all definitions.
These are all kind of
mathematical tools for getting
our heads around numbers.
It's not like one day someone
saw one of these formulas on
the face of the sun and says,
"Oh, that's part of the
universe that this is how the
average should be calculated."
These are human constructs to
kind of just get our heads
around large sets of data.
This isn't a large set of data,
but instead of five numbers, if
we had five million numbers,
you can imagine if you don't
like thinking about every
number individually.
Anyway, before I talk more
about that, let me tell
you what the mode is.
And the mode to some degree,
it's the one I think most
people probably forget or never
learn and when they see it on
an exam, it confuses them
because they're like, "Oh, that
sounds very advanced." But in
some ways, it is the easiest of
all of the measures of central
tendency or of average.
The mode is essentially what
number is most common in a set.
So in this example, there's
two 1's and then there's one
of everything else, right?
So the mode here is 1.
So mode is the most
common number.
And then you could kind of
say, "Whoa, hey Sal, what
if this was our set?
1, 1, 2, 3, 4, 4." Here I have
two 1's and I have two 4's.
And this is where the mode gets
a little bit tricky because
either of these would have been
a decent answer for the mode.
You could have actually said
the mode of this is 1 or the
mode of this is 4 and it gets
a little bit ambiguous.
And you probably want
a little clarity from
the person asking you.
Most times on a test when they
ask you, there's not going
to be this ambiguity.
There will be a most
common number in the set.
So now it's like oh, well you
know, why wasn't just one
of these good enough?
You know why we learned
averages, why don't
we just use averages?
Or why don't we use arithmetic
mean all the time?
What's median and
mode good for?
Well, I'll try to do one
example of that and see if
it rings true with you.
And then you can think
a little bit more.
Let's say I had this
set of numbers.
3, 3, 3, 3, 3, and,
I don't know, 100.
So what's the
arithmetic mean here?
I have one, two, three,
four, five 3's and 100.
So it would be 115
divided by 6, right?
I could have one, two, three,
four, five, six numbers.
115 is just the sum
of all of these.
So that's equal to -- how many
times does 6 go into 115?
6 goes into it one time.
1 times 6 is 6.
55 goes into it 9 times.
9 times 6 is 54.
So it's equal to 19 1/6.
Fair enough.
I just added all the
numbers and divided by
how many there are.
But my question is, is this
really representative
of this set?
I mean, I have a ton of 3's
and then I have 100 all of a
sudden, and we're saying that
the central tendency is 19 1/6.
And, I mean, 19 1/6 doesn't
really seem indicative
of the set.
I mean maybe it does, depending
on your application, but it
just seems a little
bit off, right?
I mean, my intuition would be
that the central tendency is
something closer to 3 because
there's a lot of 3's here.
So what would the
median tell us?
I already put these
numbers in order, right?
If I give it to you out of
order, you'd want to put it
in this order and you'd say
what's the middle number?
Let's see, the middle two
numbers, since I have an
even number, are 3 and 3.
So if I take the average of
3 and 3 -- or I should be
particular with my language.
If I take the arithmetic
mean of 3 and 3, I get 3.
And this is maybe a better
measurement of the central
tendency or of the average of
this set of numbers, right?
Essentially, what it does is by
taking the median, I wasn't so
much affected by this really
large number that's very
different than the others.
In statistics they
call that an outlier.
A number that, you know, if you
talked about average home
prices, maybe every house in
the city is $100,000 and then
there's one house that
costs $1 trillion.
And then if someone told you
the average house price was, I
don't know, $1 million, you
might have a very wrong
perception of that city.
But the median house price
would be $100,000 and you get
a better sense of what the
houses in that city are like.
So similarly, this median,
maybe, gives you a better
sense of what the numbers
in this set are like.
Because the arithmetic mean
was skewed by this, what
they call an outlier.
And being able to tell what
an outlier is, it's kind of
one of those things that a
statistician will say, well,
I know it when I see it.
There isn't really a formal
definition for it but it tends
to be a number that really kind
of sticks out and sometimes
it's due to, you know, a
measurement error or whatever.
And then finally, the mode.
What is the most common
number in this set?
Well there's five 3's
and there's 100.
So the most common number
is, once again, it's a 3.
So in this case, when you had
this outlier, the median and
the mode tend to be, you know,
maybe they're a little bit
better about giving you an
indication of what these
numbers represent.
Maybe this was just a
measurement error.
But I don't know, we
don't actually know
what these represent.
If these are house prices, then
I would argue that these are
probably more indicative
measures of what the
houses in a area cost.
But if this is something else,
if this is scores on a test,
maybe, you know, maybe there is
one kid in the class -- one out
of six kids who did really,
really well and everyone
else didn't study.
And this is more indicative
of, kind of, how students at
that level do on average.
Anyway, I'm done talking
about all of this.
And I encourage you to play
with a lot of numbers and deal
with the concepts yourself.
In the next video, we'll
explore more descriptive
statistics.
Instead of talking about the
central tendency, we'll talk
about how spread apart things
are away from the
central tendency.
See you in the next video.