-
welcome to the playlist on statistics, something I've been needing to do for some time.
-
So I just want to get right into it, and I want to do as many examples as possible
-
and hopefully give you the feel for what statistics is all about
-
So I'm just gonna start off in case you're not familiar with it
-
I think a lot of people have an intuitive feel for what statistics is about
-
Well, in very general terms, it's kinf of getting your head around data
-
and it can oddly be classified in maby 3 categories
-
You have: Descripitive
-
So, say you have a lot of data, and you want to tell someone about it whitout giving him all the data
-
Maybe you can find indicative numbers that somehow represent the data
-
without having to go all over the data
-
So that would be "Descriptive statistics"
-
Well, I'll kind of group it together, there's "Inferential statistics"
-
So that's when you use data to make conclusions about things
-
So let's say you've sampled some data from a population,
-
and we'll talk a lot about sample versus population,
-
but I think you have just a basic sense of what that is,
-
right: if I survey 3 people who are going to vote for president,
-
I clearly haven't survey the entire population, I surveyed the sample.
-
But what inferential statistics are all about are:
-
"if we can make some maths on the sample, maybe we can make "inferences", or conclusions,
-
about the population as a whole.
-
Anyway, that's just a big picture of what statistics is all about,
-
so let's just get into the middle of it, and we'll start with the Descriptive
-
So the first thing that I would want to do, and I think most people would want to do
-
when they are given a whole set of numbers and they are told to describe it
-
they would think: "well, maybe I could come up with some number
-
that is most indicative of all of the numbers in that set
-
or some number that represents kind of the central tendency
-
this is a word you see a lot in statistics books
-
the "central tendency" of a set of numbers
-
And this is also called the average.
-
And I'll be a little more exact here that I normally am with the word average
-
when I talk about it in this context, it just means
-
that the average is one number, that somehow
-
is giving us a sense of the central tendency
-
or maybe a number that is most representative of a set
-
and I know that sounds all very abstract,
-
but let's do a couple of examples:
-
So there's a bunch of ways you can actually measure
-
the central tendency, or the average of a set of numbers
-
and you've probably seen this before,
-
they are the "mean"... Well actually there is types of means,
-
but let's stay with the arithmetic mean
-
And later on, we'll do geometric means,
-
and maybe we'll cover the harmonic mean one day
-
There is the "mean", the "median"
-
and the mode.
-
And in statistics speak, these are all
-
kind of representative of data set or a population central tendancy
-
or sample central tendency
-
And they are all, collectively, they can all be forms of an average
-
and I think when we'll see examples it will make a litlle more sense
-
In everyday speak, when people talk about an average,
-
And I think you've already computed averages in your life,
-
They are usually talking about the arithmetic mean
-
So normally some is asking you to calculate
-
the average of these numbers, and they are expecting you to do something,
-
they want you to figure out the arithmetic mean.
-
They don't want you to figure out the median or the mode.
-
But before we go any further, let's figure out what those things are.
-
So let me figure out a set of numbers.
-
So let's take...1...another 1, a 2, a 3, let's say I have a 4...
-
That's good enough. We just want a simple example.
-
So, the mean, or the arithmetic mean,
-
and what's probably what you're most familiar with
-
when people talk about average,
-
and that's essentially: you add up all the numbers,
-
and you divide by the number that they are.
-
So that's: 1+1+1+3+4
-
and you're going to divide by: 1,2,3,4..5 numbers
-
1+1 is 2, 2+2 is 4, 4+3 is 7, 7+4 is 11
-
So that's equal to 11 over 5
-
That's what? 2 and 1/5, so that's equal to 2.2
-
And some could say: "hey you know that's a pretty representative number
-
of this set, that's the number that all of this numbers,
-
you can kind of say, are closest to.
-
Or 2.2 is the central tendancy of this set"
-
And in common speak, that would be "the average".
-
But if we want to be a little bit more particular,
-
this would be the arithmetic mean of this set of number that you see
-
it kind of represents them.
-
If I didn't want to give you the list of 5 numbers,
-
I could say: "Well you know I have a set of 5 numbers,
-
and their mean is 2.2"
-
and it can of tells you a little bit of, at least, where the numbers are
-
Right, we'll tell a bit more about how far the numbers are from that mean
-
in probably the next video.
-
So that's one measure.
-
Another measure, instead of averaging it this way,
-
you can average it by: putting the numbers in order,
-
which I actually already did,
-
so let's just right them down in order again.
-
1,1,2,3,4
-
And you just take the middle number!
-
So let's see: there's 1,2,3,4...5 numbers,
-
so the middle number is gonna be right here, right?
-
The middle number is 2
-
There is two numbers greater than 2,
-
and there is two numbers less than 2.
-
And this is called the median,
-
which is actually very simple computation:
-
you just have to sort the numbers,
-
and then you find whatever number
-
where you have an equal number of "greater than", or "less than that number"
-
So the median of this set is 2.
-
And you see, that's fairly close to the mean.
-
And, there is no right answer.
-
One of these isn't a better answer for the average,
-
they are just different ways of measuring the average.
-
So here it's the median.
-
And I what you might be thinking:
-
"well, that was easy enough when we have 5 numbers,
-
but what if we have 6 numbers, what if it was like this:
-
what if this was our set of numbers?
-
1,1,2,3,4...let's add another 4 there
-
So now there no midle number right?
-
I mean, 2 is not the middle number,
-
because ther is two numbers less than 2,
-
and there three greater than 2
-
And 3 is not the middle number,
-
because ther's two larger and three less than 3
-
So there's no middle number.
-
So when you have a set with even numbers,
-
and someone tells you to figure out the median,
-
what you do is you take the middle two numbers
-
then you take the arithmetic mean of those two numbers
-
So in this case, for this case,
-
the median would be 2.5
-
Fair enough
-
Let's put this aside,
-
because I want to compare
-
the median and the mean and the mode
-
for the same set of numbers
-
But that's a good thing to know,
-
because sometimes, it can be a little bit confusing.
-
And you know, these are all definitions,
-
this are all kind of mathematical tools
-
for getting you head around numbers,
-
there's nothing like:
-
oneday, someone saw one of these formulas
-
at the face of the sun, and say:
-
"oh, that's part of the universe,
-
this is how the average should be calculated"
-
These are human constructs
-
to get our heads around large sets of data
-
This isn't a large set of data,
-
we have five numbers,
-
but if we have five billion numbers,
-
you can imagine that you don't like
-
to think about every number individually.
-
But anyway, before I talk more about that,
-
let me tell you what the mode is.
-
And the mode, to some degree,
-
is the one that most people probably forget
-
or never learn, and when then see it on an exam
-
it confuses them, because "oh, that's sounds very advanced"
-
But in some way it is the easiest
-
of all of the measures of central tendancy
-
The mode is essentially:
-
"what number is most common in the set"
-
So, in this example, there is two 1,
-
and there is one of every thing else, right?
-
So the mode here is: 1
-
So mode, you kind of say is
-
"the most common number".
-
And then you can say:
-
"hey, what if our set was: 1,1,2,3,4,4
-
So here I have two 1, and I have two 4.
-
And this is were the mode gets a little bit tricky,
-
because either of these would have been a decent answer
-
for the mode.
-
You could have actually say that
-
"the mode of this is 1"
-
or "the mode of this is 4".
-
And it gets a little bit ambiguous,
-
and you probably want a little of clarity
-
from the person asking you.
-
Most times on a test, when they ask you,
-
there's not gonna be this situation.
-
There will be a most common number in the set.
-
So then I say: "Why wasn't one of these good enough?"
-
Why we do we use averages?
-
Why don't we use arithmetic mean all the time?
-
What are the median and mode good for?"
-
Well I'll try to do a little one example of that,
-
and see if it "rings true with you".
-
And then you can think a little bit more.
-
Let's have this set of numbers:
-
3,3,3,3,3, and, I don't know... and 100!
-
Sa what's the arithmetic mean here?
-
So 1,2,3,4,5 threes and 1 one hundred.
-
So it would be 115/6, right?
-
Because there's 1,2,3,4,5,6 numbers.
-
And 115 is just the sum of all of these.
-
So that's equal to:...19 and 1/6
-
Fair enough, I just added the numbers,
-
and divided by how many they are,
-
But my question is: "is this really a representative of this set?"
-
I mean, I have a ton of threes,
-
then I have one 100 all of a sudden,
-
and we are saying that the central tendency
-
is 19 and 1/6.
-
Does 19 and 1/6 really seem an indicative of this set?
-
Maybe is does, depending on the application,
-
but it just seems a little bit off.
-
My intuition would be that the central tendency
-
is something closer to 3.
-
Because their is a lot of threes here.
-
So what will the median tell us?
-
I already put these numbers in order, right?
-
If I gave to you in another order,
-
you would put them in this order.
-
And you would say: "what's the middle number?"
-
Let's see: the middle two numbers,
-
since they have an even number, are 3 and 3.
-
So if I take the arithmetic mean of 3 and 3,
-
I get 3.
-
And this is maybe a better measurement
-
of the central tendency, or the average
-
of this set of numbers, right?
-
Essentially what it does is,
-
by taking the median,
-
I wasn't affected by this very large number
-
that's very different from the others.
-
The statistics call that an outlier.
-
A number, that, you know,
-
if talked about average home prices,
-
maybe every house in the city
-
is 100,000$, and then
-
there's one house that cost a trillion $.
-
And then someone tells you:
-
"the average price is.. I don't know.. 1 million $"
-
then you might have a very wrong
-
perception of that city.
-
But the median house price, would be 100,000$
-
and you would get a better sense
-
of what houses in that city are like.
-
So, similarly, this median maybe
-
gives you a better sense of what
-
the numbers in this set are like.
-
Because the arithmetic mean
-
was influenced by this outlier.
-
And be able to tell what an outlier is,
-
is kind of those things that
-
statisticians will say:
-
"well I know it when I see it"
-
There isn't really a formal definition for it,
-
but it tends to be
-
a number that really kinds of sticks out.
-
Sometimes it's, you know, a measurement error or whatever.
-
And then finally the mode.
-
What is the most common number in the set?
-
Well there is five 3 and one 100,
-
so the most common number is...
-
3
-
So in this case, when we have this outlier,
-
the median and the mode tend to be
-
maybe a little bit better about giving you an indication
-
of what this numbers represent.
-
Maybe this was just a measurement error.
-
But I dont know.
-
We don't actually know what these numbers represent
-
if these where house pricies,
-
I would argue that these are more indicative
-
measures of what are the prices in the area.
-
But if this something else,
-
if these are scores on a test,
-
maybe there is one kid in the class, out of 6 kids,
-
who did really really well
-
and everyone else didn't study.
-
And this is more indicative of how
-
students at that level are doing on average.
-
Anyway, I'm done talking about all of this.
-
And I encourage to play with a lot of numbers,
-
and deal with the concepts yourself.
-
In the next video, we'll explore
-
more descriptive statistics,
-
and instead of talking about the central tendency,
-
we'll talk about how spread apart
-
things are away from the central tendency.
-
See you in the next video!