Transcript z-scores
Statistics
or
What’s normal about the normal curve,
what’s standard about the standard deviation,
and what’s co-relating in a correlation?
Statistics: Intro
Overview
• What’s normal about the normal curve?
– The nature of the confusion
– One formal answer
– An intuitive answer (real-time demo)
• What’s standard about a standard deviation?
– Z-scores
• What’s co-relating in a correlation?
Statistics: Intro
What’s normal about the normal curve(s)?
• There are a number of ways of mathematically defining
and estimating the normal distribution (which defines a
class of curves, not one single curve)
• The main question I want to address today is: what does
that math mean? Why are so many things normally
distributed? What makes sure that those things stay
distributed normally? What stops other things from being
normally distributed at all?
Statistics: Intro
From: Wilensky, U., (1997). What is Normal Anyway? Therapy for
Epistemological Anxiety. Educational Studies in Mathematics. Special Issue on
Computational Environments in Mathematics Education. Noss R. (Ed.) Volume
33, No. 2. pp. 171-202.
U: Why do you think height is distributed normally?
L: Come again? (sarcastic)
U: Why is it that women's height can be graphed using a normal curve?
L: That's a strange question.
U: Strange?
L: No one's ever asked me that before..... (thinking to herself for a while) I guess
there are 2 possible theories: Either it's just a fact about the world, some guy
collected a lot of height data and noticed that it fell into a normal shape.....
U: Or?
L: Or maybe it's just a mathematical trick.
U: A trick? How could it be a trick?
Statistics: Intro
L: Well... Maybe some mathematician somewhere just concocted this crazy
function, you know, and decided to say that height fit it.
U: You mean...
L: You know the height data could probably be graphed with lots of different
functions and the normal curve was just applied to it by this one guy and now
everybody has to use his function.
U: So you’re saying that in the one case, it's a fact about the world that height is
distributed in a certain way, and in the other case, it's a fact about our
descriptions but not about height?
L: Yeah.
U: Well, if you had to commit to one of these theories, which would it be?
L: If I had to choose just one?
U: Yeah.
L: I don't know. That's really interesting. Which theory do I really believe? I guess
I've always been uncertain which to believe and it's been there in the
background you know, but I don't know. I guess if I had to choose, if I have to
choose one, I believe it's a mathematical trick, a mathematician's game. ....What
possible reason could there be for height, ....for nature, to follow some weird
bizarro function?
Statistics: Intro
Formal answer 1: The binomial distribution I
The chance of an event of probability p happening r times out
of n tries:
P(r) = n!/(r! (n - r)!) * pr * (1 - p) n-r
(Recall: We wondered about this generalization last class.)
Statistics: Intro
Formal answer 1: The binomial distribution II
Why is it called the binomial distribution?
Bi = 2
Nom = thing
= the two-thing distribution
It can be used wherever:
• 1.Each trial has two possible outcomes (say, success and
failure; or heads and tails)
• 2.The trials are independent = the outcome of one trial has
no influence over the outcome of another trial.
• 3. The trials are mutually exclusive
• 4. The events are randomly selected
Statistics: Intro
Let’s try it out (Example 6.3 from last class)
• What are the odds of there being exactly one seven
out of two rolls?
• one way is to roll 7 first, but not second
- the odds of this are 1/6 * 5/6 (independent
events) = 0.138
- the odds of rolling 7 second are 5/6 * 1/6
(independent events) = 0.138
- since these two outcomes are mutually
exclusive, we can add them to get 0.138 + 0.138 = 0.277
Statistics: Intro
The generalization (Example 6.3 from last class)
• What are the odds of there being exactly one seven
out of two rolls?
An event of probability p happens r times out of n tries:
P(r) = n!/(r! (n - r)!) * pr * (1 - p) n-r
p = 1/6; N = 2; r = 1
2!/(1!1!)*1/61*5/61 = 0.277777778
Statistics: Intro
What does this have to do with the normal distribution?
Statistics: Intro
What does this have to do with the normal distribution?
Statistics: Intro
Why does this normal distribution happen?
[See http://ccl.northwestern.edu/cm/index.html
for the StarLogoT demo used in class.
Can you understand:
What effect changing the probabilities of each event has?
What has to change to skew a normal curve?]
Statistics: Intro
The standard deviation
From: http://www.psychstat.smsu.edu/introbook/sbk00.htm
• Given the non-linear shape of the normal distribution, one
has two choices:
– A.) Keep the amount of variation in each division
constant, but vary the size of the divisions
– B.) Keep the size of each division constant, but vary the
the amount of variation in each division
Statistics: Intro
The standard deviation (SD)
• The SD takes the second approach: it keeps the size of
each division constant, but varies the the amount of
variation in each division
• The SD is a measure of average deviation (difference)
from the mean
• It is the square root of the variance, which is the
average squared difference from the mean.
Statistics: Intro
Z-scores
• If we express differences by dividing them by SDs, we
have z-scores: standard units of difference from the mean
• THESE Z-SCORES WILL COME IN EXTREMELY
USEFUL!
– For example, we might want to know:
• If a 12-foot elephant is taller (compared to the
height of average elephants) than a 230 pound man
is heavy (compared to weight of average men)
• If a person with a WAIS IQ of 140 is rarer than a
person with a GPA of 3.9
—Etc.
Statistics: Intro
What co-relates in a correlation?
• In a correlation, we want to find the equation for the (one
and only) line (the line of regression) which describes the
relation between variables with the least error.
– This is done mathematically, but the idea is simply that
we draw a line such that the squared distances on two
(or more) dimensions of points from the line would not
be less for any other line
Statistics: Intro
What co-relates in a correlation?
• R = The covariance of x and y / the product of the SDs of
X and Y
• Covariance is related to variance = the mean value of all
the pairs of differences from the mean for X multiplied by
the differences from the mean for Y (the mean product of
differences from the means)
• When X and Y are related, large numbers will be
systematically multiplied by large numbers with the same
sign (for differences on both sides of the mean) =
covariance will be large & close to the product of the SDs
of X and Y, so R will be close to 1.
Statistics: Intro