Measures of Central Tendency

Download Report

Transcript Measures of Central Tendency

Measures of Central
Tendency
“to be or not to be
Normal”
TOPICS
•
•
•
•
•
•
•
Normal Distributions
Skewness & Kurtosis
Normal Curves and Probability
Z- scores
Confidence Intervals
Hypothesis Testing
The t-distribution
Is this normal ?
3.5
3.0
2.5
2.0
1.5
1.0
Std. Dev = 160.68
.5
Mean = 178.3
N = 6.00
0.0
100.0
200.0
300.0
400.0
500.0
VAR00001
VAR00001
Valid
70.00
100.00
150.00
500.00
Total
Frequency
1
2
2
1
6
Percent
16.7
33.3
33.3
16.7
100.0
Valid Percent
16.7
33.3
33.3
16.7
100.0
Cumulative
Percent
16.7
50.0
83.3
100.0
Statistics
VAR00001
N
Mean
Skewness
Std. Error of Skewnes s
Kurtos is
Std. Error of Kurtosis
Valid
Mis sing
6
0
178.3333
2.242
.845
5.219
1.741
Normal Distributions
• Are your curves normal?
• Why do we care about normal curves?
• What do normal curves tell us?
Answer:
The curves tell us something about the distribution
of the population
The curves allow us to make statistical inferences
regarding the probability of some outcomes
within some margin of error
The normal distribution
• A distribution is easily
depicted in a graph
where the height of the
line determined by the
frequency of cases for
the values beneath it.
• Most cases cluster
near the middle of a
distribution if close to
normal
The Normal Curve
• Bell-shaped distribution or curve
• Perfectly symmetrical about the mean.
Mean = median = mode
• Tails are asymptotic: closer and closer to
horizontal axis but never reach it.
Skewness and Sample Distributions
Not all curves are normal, even if still bell-shaped
Skewness
• Formula for skewness
3(mean  median)
Skewness
Sy
Kurtosis (It’s not a disease)
• Beyond skewness, kurtosis tells us when
our distribution may have high or low
variance, even if normal.
• The kurtosis value for a normal distribution
will equal 3. Anything above this is a
peaked value (low variance) and anything
below is platykurtic (high variance).
Back to normal distributions
• The power of normal distributions, or those
close to it, is that we can predict where
cases will fall within a distribution
probabilistically.
• For example, what are the odds, given the
population parameter of human height, that
someone will grow to more than eight feet?
• Answer, likely less than a .025 probability
Sample Distribution
• What does Andre the
Giant do to the sample
distribution?
• What is the probability
of finding someone
like Andre in the
population?
• Are you ready for
more inferential
statistics?
• Answer: Oh boy, yes!!
Normal Curves and probability
• We have answered the question of what
Andre and the Sumo wrestler would do to
the distribution
• But what about the probability of finding
someone the same height as Andre in the
population?
• What is the probability of finding someone
the same height as Dr. Peña or Dr.
Boehmer?
More on normal curves and
probability
Dr. Boehmer would be here
Andre would be here
Z-Scores (no sleeping!!)
• We can standardize the central tendency
away from the mean across different
samples with z-scores.
•
The basic unit of the z-score is the standard
deviation.
(Xi  X )
z
s
We can use the z-score to score each
observation as a distance from the
mean.
How far is a given observation from the
mean when its z-score = 2?
Answer: 2 standard deviations.
Approximately what percentage of cases
is a given case higher than if its z-score
= 2?
Answer: 97%
Random Sampling Error
• Ever hear a poll report a margin of error? What
is that?
Random Sampling Error = standard deviation/ square
root of the sample size
Or

N
As the variance of the
population increases, so
does the chance that a
sample could not reflect the
population parameters
Standard Error
• We often refer to both the random sampling
error with both the chance to err when
sampling but also the error of a specific
sample statistic, the mean. We typically
use the term Standard Error.
• A sample statistic standard error is the
difference between the mean of a sample
and the mean of the population from which
it is drawn.
Standard Error
Example: What if most humans were 200
pounds and only 1 million globally were 250
pounds?
The random sampling error would be low
since the chance of collecting a sample
consisting heavily of those heavier humans
would be unlikely. There would not be
much error in general from sampling
because of the low variance.
Standard Error
• Example continued. Now, when we take a
sample, each sample has a mean. If a
population has low variance, so should the
samples. We should see this reflected in
low standard error in the mean of the
sample, the sample statistic.
• Of course, higher variance in the
population also causes higher error in
samples taken from it.
Some more notation
Distributions
Mean
Sample of
observed data
X
Population
μ
Repeated
Sampling
μ
Standard Dev.
s
σ

N
Random Sampling Error
Error in a Sample’s mean is the Standard Error
s
n
Central Limit Theorem
Remember that if we took an infinite number
of samples from a population, the means
of these samples would be normally
distributed.
Hence, the larger the sample relative to the
population, the more likely the sample
mean will capture the population mean.
Confidence Intervals
• We can actually use the information we
have about a standard deviation from the
mean and calculate the range of values for
which a sample would have if they were to
fall close to the mean of the population.
• This range is based on the probability that
the sample mean falls close to the
population mean with a probability of .95,
or 5% error.
How Confident Are You?
• Are you 100% sure?
• Social scientists use a 95% as a threshold
to test whether or not the results are
product of chance.
• That is, we take 1 out of 20 chances to be
wrong
• What do you MEAN?
We build a 95% confidence interval to make
sure that the mean will be within that
range
Confidence Interval (CI)


Y  Z / 2 y
Y = mean
Z = Z score related with a 95% CI
σ = standard error
samplemean 1.96(or 2) * standarder ror
Building a CI
• Assume the following
 y  100

 y  15

N  400

Y
 y

y
15
400
N
 .750
CI
100  (1.96)(0.750 )
Upper  101 .47
Lower  98.53
Why do we use 1.96?
Calculating a 95% CI
1. Let’s look at the class population
distribution of height
2. Is it a normal or skew distribution?
3. Let’s build a 95% CI around the mean
height of the class
Why do we care about CI?
• We use CI interval for hypothesis testing
• For instance, we want to know if there is
an income difference between El Paso
and Boston
• We want to know whether or not taking
class at Kaplan makes a difference in our
GRE scores
Mean Difference testing
Mean USA
El Paso
Las Cruces
Income levels
Boston