Chapter 5 Review

Download Report

Transcript Chapter 5 Review

Chapter 5
Introduction
to Inferential
Statistics
Definition
infer - vt., arrive at a decision by or
opinion by reasoning from known facts or
evidence.
Sample
A sample comprises a part of the population
selected for a study.
Random Samples
If every score in the population has an equal
chance of being selected each time you chose
a score, then it is called a random sample.
Random samples, and only random samples,
are representative of the population from
which they are drawn.
Q: ON WHAT MEASURES IS A
RANDOM SAMPLE
REPRESENTATIVE OF THE
POPULATION?
A: ON EVERY MEASURE.
REPRESENTATIVE ON
EVERY MEASURE
The mean of the random sample’s height
will be similar to the mean of the
population.
The same holds for weight, IQ, ability to
remember faces or numbers, the size of
their livers, self-confidence, how many
children their aunts had, etc., etc., etc. ON
EVERY MEASURE THAT EVER WAS OR
CAN BE.
All sample statistics are
representative of their
population parameters
The sample mean is a least squares,
unbiased consistent est
REPRESENTATIVE ON
measures of central
tendency (the mean), on
measures of variability
(e.g., sigma2), and on all
derivative measures
For example, the way scores fall around the
mean of a random sample (as indexed by MSW)
will be similar to the way scores fall around the
mean of the population (as indexed by sigma2).
THERE ARE OCCASIONAL
RANDOM SAMPLES THAT
ARE POOR
REPRESENTATIVES OF
THEIR POPULATION
But 1.) we will take that into account
And 2.) most are fairly to very good
representatives of their populations
Population Parameters and
Sample Statistics: Nomenclature
The characteristics of a population are called
population parameters. They are usually
represented by Greek letters (, ).
The characteristics of a sample are called
SAMPLE STATISTICS. They are usually
represented by the English alphabet (X, s).
Three things we can do
with random samples
Estimate population parameters. This is called
estimation research.
Estimate the relationship between variables in
the population from their relationship in a
random sample. This is called correlational
research.
Compare the responses of random samples to
different conditions. This is called experimental
research.
Estimating population
parameters
Sample statistics are least squares,
unbiased, consistent estimates of their
population parameters.
We’ll get to this in a minute, in detail.
Correlational Research
We observe the relationship among variables
in a random sample. We are unlikely to find strong
relationships purely by chance. When you study a
sample and the relationship between two variables
is strong enough, you can infer that a similar
relationship between the variables will be found
in the population as a whole.
This is called correlational research.
For example, height and weight are co-related.
What is needed for
correlational research
 In Chapter 6, you will learn to turn scores on different
measures from a sample into scores that can be directly
compared to each other.
 In Chapter 7, you will learn to compute a single number
that describes the direction and consistency of the
relationship between two variables. That number is
called the correlation coefficient.
 In Chapter 8, you will learn to predict scores on one
variable scores on another variable when you know (or
can estimate) the correlation coefficient.
 In Chapter 8, you will also learn when not to do that
and to go back to predicting that everyone will score at
the mean of their distribution.
Experimental Research
In Chapters 9 – 11 you will learn about experiments.
In an experiment, we start with samples that can be
assumed to be similar and then treat them differently.
Then we measure response differences among the samples
and make inferences about whether or not similar
differences would occur in response to similar treatment
in the whole population.
For example, we might expose randomly selected groups
of depressed patients to different doses of a new drug to
see which dose produces the best result.
If we got clear differences, we might suggest that all
patients be treated with that dose.
Experimental Research
We apply different treatments to samples and
then measure the response differences and if, and
only if, the differences among samples are large
enough, we can infer that the same differences
would occur in the population.
This is called experimental research.
For example, studying the effect of Vitamin C
on the likelihood of obtaining a cold.
In this chapter, we will
focus on estimating
population parameters
from sample statistics.
Estimation research
We measure the characteristics of a random
sample and then we infer that they are similar
to the characteristics of the population.
Characteristics are things like the mean and
standard deviation.
Estimation underlies both correlational and
Experimental research.
Definition
A least square estimate is a number that is
the minimum average squared distance from
the number it estimates.
We will study sample statistics that are least
squares estimates of their population parameters.
Definition
An unbiased estimate is one around which
deviations sum to zero.
We will study sample statistics that are unbiased
estimates of their population parameters.
Definition
A consistent estimator is one where the larger
the number of randomly selected scores underlying
the sample statistic, the closer the statistic will tend
to come to the population parameter.
We will study sample statistics that are consistent
estimates of their population parameters.
The sample mean
The sample mean is called X-bar and is
represented by X.
X is the best estimate of , because it is a least
squares, unbiased, consistent estimate.
X = X / n
Estimated variance
The estimate of 2 is called the mean squared
error and is represented by MSW.
It is also a least squares, unbiased, consistent
estimate.
SSW = (X - X)2
MSW = (X - X)2 / (n-k)
Estimated standard
deviation
The estimate of  is called s.
s = MSW
In English
We estimate the population mean by finding the
mean of the sample.
We estimate the population variance (sigma2)
with MSW by first finding the sum of the squared
differences between our best estimate of mu
(the sample mean) and each score. Then, we
divide the sum of squares by n-k where n is the
number of scores and k is the number of groups
in our sample.
We estimate sigma by taking a square root of
MSW, our best estimate of sigma2.
Estimating mu and sigma
– single sample
S#
A
B
C
X
6
8
4
X=18
N= 3
X=6.00
X
6.00
6.00
6.00
(X - X)2
0.00
4.00
4.00
(X - X)
0.00
2.00
-2.00
(X-X)=0.00
(X-X)2=8.00 = SSW
MSW = SSW/(n-k) = 8.00/2 = 4.00
s=
MSW = 2.00
Group1
1.1
1.2
1.3
1.4
X
50
77
69
88
X
71.00
71.00
71.00
71.00
Group2
2.1
2.2
2.3
2.4
(X - X)
-21.00
+6.00
-2.00
+17.00
(X-X1)=0.00
(X - X)2
441.00
36.00
4.00
289.00
(X-X1)2= 770.00
78
57
82
63
70.00
70.00
70.00
70.00
Group3
3.1
3.2
3.3
3.4
8.00
-13.00
12.00
-7.00
(X-X2)=0.00
64.00
169.00
144.00
49.00
(X-X2)2= 426.00
74
70
63
81
72.00
72.00
72.00
72.00
2.00
-2.00
-9.00
9.00
(X-X3)=0.00
4.00
4.00
81.00
81.00
(X-X3)2= 170.00
X1 = 71.00
X2 = 70.00
X3 = 72.00
MSW = SSW/(n-k) = 1366.00/9 = 151.78
s = MSW = 151.78 = 12.32
Why n-k?
This has to do with “degrees of freedom.”
As you saw last chapter, each time you
add a score to a sample, you pull the
sample statistic toward the population
parameter.
Any score that isn’t free to
vary does not tend to pull the
sample statistic toward the
population parameter.
One deviation in each group is
constrained by the rule that deviations
around the mean must sum to zero. So
one deviation in each group is not free to
vary.
Deviation scores underlie our computation
of SSW, which in turn underlies our
computation of MSW.
n-k is the number of degrees
of freedom for MSW
 You use the deviation scores as the basis of estimating
sigma2 with MSW.
 Scores that are free to vary are called degrees of
freedom.
 Since one deviation score in each group is not
free to vary, you lose one degree of freedom for
each group - with k groups you lose k*1=k
degrees of freedom.
 There are n deviation scores in total. k are not free to
vary. That leaves n-k that are free to vary, n-k degrees
of freedom MSW, for your estimate of sigma2.
 The precision or “goodness” of an estimate is based on
degrees of freedom. The more df, the closer the
estimate tends to get to its population parameter.
Group1
1.1
1.2
1.3
1.4
X
50
77
69
88
X
71.00
71.00
71.00
71.00
Group2
2.1
2.2
2.3
2.4
(X - X)
-21.00
+6.00
-2.00
+17.00
(X-X1)=0.00
(X - X)2
441.00
36.00
4.00
289.00
(X-X1)2= 770.00
78
57
82
63
70.00
70.00
70.00
70.00
Group3
3.1
3.2
3.3
3.4
8.00
-13.00
12.00
-7.00
(X-X2)=0.00
64.00
169.00
144.00
49.00
(X-X2)2= 426.00
74
70
63
81
72.00
72.00
72.00
72.00
2.00
-2.00
-9.00
9.00
(X-X3)=0.00
4.00
4.00
81.00
81.00
(X-X3)2= 170.00
X1 = 71.00
X2 = 70.00
X3 = 72.00
MSW = SSW/(n-k) = 1366.00/9 = 151.78
s = MSW = 151.78 = 12.32
More scores that are free
to vary = better estimates:
the mean as an example.
Each time you add a randomly selected score to your sample,
it is most likely to pull the sample mean closer to mu, the
population mean.
Any particular score may pull it further from mu.
But, on the average, as you add more and more scores, the odds
are that you will be getting closer to mu..
Book example
Population is 1320 students taking a test.
 is 72.00,  = 12
Unlike estimating the variance (where df=n-k) when estimating the
mean, all the scores are free to vary.
So each score in the sample will tend to make the sample mean a better
estimate of mu.
Let’s randomly sample one student at a time and see what happens.
Test Scores
F
r
e
q
u
e
n
c
y
Standard
deviations
Scores
Mean
score
3
2
1
0
1
2
3
36
48
60
72
84
96
108
Sample scores:
102
Means:
72
87
66
80
76
79
66
76.4
78
69
76.7 75.6
63
74.0
Consistent estimators
This tendency to pull the sample mean back to the population
mean is called “regression to the mean”.
We call estimates that improve when you add scores
to the sample consistent estimators.
Recall that the statistics that we will learn are:
consistent,
least squares, and
unbiased.