Transcript Chi Square

Chi Square
2
Parametric Statistics
• Everything we have done so far assumes
that data are representative of a probability
distribution (normal curve).
• We are making inferences about the
parameters (statistics of the population) of
the distribution.
• That is why this is called parametric
statistics.
Non-Parametric Statistics
• If the data are not assumed to be part of a probability
distribution, then there is no distribution to which to
make inferences.
• The most frequent reason this happens is when data
are not interval.
• To make inferences, some characteristic of the data
must approximate a probability distribution.
• These are call non-parametric statistics.
Variables
• Nominal
– Named groups (mode)
• Ordinal
– Ordered named groups (median)
• Interval (Ratio)
– Continuous scales (mean)
A Problem
• With nominal or ordinal data we can’t compare group
means.
– average category or average of low/medium/high
• But, there ought to be some way to know if the the
distributions of responses in nominal or ordinal
categories could have occurred by chance.
• The most common way to do this is with a
 2 (chi square).
Reading Selection by Source
50
Female
Male
40
30
20
10
0
None
Book
Pre Dist
Magazine
Post Dist
Online
Contingency Tables (Cross Tabs)
Test of Independence
Book
Magazine
Online
Women
24
36
42
Men
22
19
30
• Are these numbers independent? If they aren’t then
it means that one variable influences the other.
Contingency Tables (Cross Tabs)
Test of Independence
Book
Magazine
Online
Women
24
36
42
Men
22
19
30
• If gender isn’t influenced by reading preference
then the proportions for each category of reading
preference should be the same.
Contingency Tables (Cross Tabs)
Test of Independence
Book
Magazine
Online
Women
24
36
42
Men
22
19
30
• If reading preference isn’t influenced by gender
then the proportions for each category gender
should be the same.
Contingency Tables (Cross Tabs)
Test of Independence
Book
Magazine
Online
Women
24
36
42
Men
22
19
30
• If neither is influenced by the other then the
proportions should be the same throughout the
model.
• Is the the distribution of reading preference
different by gender?
• Is the distribution of gender different by
reading preference?
• Are these two variables related?
• (Null: the two variables are not related)
Chi Square
Tests of Independence
• Given the Observed Frequencies there
ought to be some way to imagine what the
most likely number would be in each cell if
the numbers were independent.
• This is kind of super averaging the counts
in related cells and predicting what should
be in the cell.
Contingency Tables
Test of Independence
Book
Women
24
Men
22
Magazine
?
36
?
19
?
Online
42
?
30
?
?
Chi Square
Tests of Independence
• Given the Observed Frequencies determine what
would be in each cell if the variance in the rows
and columns were accounted for—looking at row
and column proportions simultaneously.
• This is done by:
Row total x Column total/ Total
• This new set of values is called the Expected
Frequencies
Contingency Tables
Test of Independence
Book
Women
24
Men
22
Magazine
36
27.12
42
32.43
19
18.88
Online
42.45
30
22.57
29.55
Contingency Tables
Test of Independence
Book
Women
24
Men
22
Magazine
36
27.12
42
32.43
19
18.88
Online
42.45
Row
Total
= 102
30
22.57
Column
Total = 46
29.55
Sample
Total = 173
(102 x 46)/173 = 27.12
Now What?
(Computing a Chi Square)
• The gathered data are the
Observed (Actual) Values.
• Expected Values—a computation of what should
be in each cell based on the existing sample
distribution.
• First the computer builds a model that represents
the expected frequencies for each cell.
• Then the differences between the observed
frequencies the expected frequencies are
computed. (O – E)
Now What?
(Computing a Chi Square)
• Because (O – E) might be negative each
difference is squared.
• Since we want to know when the
differences are comparatively big or small
the real number difference has to be turned
into a ratio.
• Now it is time to add all of these up:
the sum of squared differences—chi square
• Last, given the df, the computed sum of
squares is compared to a distribution of
possible sum of squares.
(O – E)2
(O – E)2
E
Σ
(O – E)2
E
Chi Distribution
This curve represents the
probability of getting a given
chi square.
(The sum of each of the
squared differences divided by
the expected frequency.)
5% of the area
There is one of these for every possible degrees of freedom
df = (number of rows - 1) x (number of columns - 1)
• Sometimes the difference between the actual
values and the expected values is so small that
they it can be attributed to chance variation.
• If that is true we say that the variables are
independent.
• Sometimes the difference between the actual
values and the expected values is so large that it is
worth talking about why those differences
appeared. The difference is so large it is unlikely
to have happened by chance. (p <.05)
Contingency Tables
Test of Independence
Book
Women
24
Men
22
Magazine
36
27.12
42
32.43
19
18.88
Online
42.45
30
22.57
29.55
chi square = 1.85
Chi Distribution
This curve represents the
probability of getting a given
chi square.
,
(The sum of all the differences
squared divided by the total
number of data points.)
2 df
1.85
5% of the area
There is one of these for every possible degrees of freedom
(rows-1) x (columns-1)
Contingency Tables
Test of Independence
Book
Women
24
Men
22
Magazine
36
27.12
42
32.43
19
18.88
Online
42.45
30
22.57
29.55
chi square = 1.85
p = 0.397
 2 Cautions
• When expected values drop below 5, the
estimator has too much influence on the
statistic.
• In other words, don’t do  2 with small
samples.
• Avoid over interpreting the results.
Caution: chi square only tells if the total difference was
likely to occur by chance—not individual differences.
You can only say IF the variables are related—not how.
Book
Women
24
Men
22
Magazine
36
27.12
42
32.43
19
18.88
Online
42.45
30
22.57
29.55
chi square = 1.85
p = 0.397
Using Excel to Compute
2
The Chi Square Calculation
• You will never see the distribution, only the
chi calculation and the p value.
• The chi table example is on the webpage
Table 1
Faculty and Student Self-Perception of Technology Competence by Gender
Reported Skill Level
Minimally Skilled
Moderately Skilled Accomplished
Male Faculty
7
56
22
Female Faculty
48
153
79
Male Students
4
15
17
Female Students
χ2 (6) = 13.22, p = .04
2
15
4
Actual counts
Degrees Chi
of Square
value
p value
Freedom
Chi Square
(Goodness of Fit)
• Special case when the expected frequencies
are predetermined.
• Are there important differences between
what we are seeing and some assumed norm
(the predetermined values)?
 2 Goodness of Fit
• Counts in categories
• Compares actual counts to norms.
• Tests to see if the differences between the
two are so large they are unlikely to have
occurred randomly.
 2 Goodness of Fit
Expected
Actual
Expected
Expected
Actual
Actual
 2 Goodness of Fit
 2 Goodness of Fit
Chi Square = 8.09
p = .0175
In Excel
Ham
16
23
Cheese
34
23
Chi Square
0.01753637
=CHITEST(actual_range,expected_range)
Caprese
19
23
Examples of Goodness of Fit  2
• Technically all  2 are Goodness of fit.
• Uses might be:
– When no difference is predicted in categories.
– Comparison of two iterations of the same group
to see if change has occurred.
– Comparison to known distribution (i.e., zscores)
Chi Square
• Non-parametric comparisons are weak.
• They serve as a motivation for parametric
analysis.
• Sometimes they are all that is possible.
Regression Analysis
Regression
• Correlation: Strength of association
between two variables. (The amount of
shared variance).
• Linear Regression: Defining a functional
relationship between variables so that one
can be predicted from the other.
Regression
• Linear Regression is done by figuring out
how to put a line through the data where all
of the points are as close to the line as
possible (least squares).
Regression
• What you end up with is a formula for
predicting one variable from another but
this isn’t perfect because not all of the
points will fall on the line.
Regression
• The added up amount that the actual points
vary from the prediction line is called the
residual.
• The inverse of the residual is the amount of
variance
in the
model.
Regression
• Regression is a model for predicting how
much variance in one variable (dependent)
is explained by another (predictor or
independent).
Regression
• If the residual is low then the variance
explained will be high. The model captures
most of the variance.
Regression
• If the residual is high then the variance
explained will be low. The model doesn’t
capture much of the variance.
Regression—Why Do This
• Linear regression has predictive power.
• The strength of that power is related to the
residual—the amount of the variance
explained by the prediction in the model.
• It might be worth finding out how to predict
blood pressure from age. Linear regression
would allow us to build a model to do this
and tell us how much of the variance in the
model would be explained by the
prediction—not very much.
Excel
• From the Course Site: Content Scores
Multiple Regression
• Think about variance as a way to
understand how confident you can be that
you have a good model for prediction.
• It might be valuable to look at a whole
bunch of variables at the same time and see
how much variance they explain.
Multiple Regression
• Example: what combination of variables
would do a good job of predicting life
expectancy?
• Read the literature to make a good guess.
• Look at whole bunch of people whose age
at death you know.
• Gather data on all the variables that you
think might impact how long someone lives.
Multiple Regression
• Now build a model:
– Look at age at death (dependent variable)
– Look at one of the variables that might predict
death and see how much variance is explained
in the model.
– Now add another variable into the model and
see how much variance is explained now. (In a
way it is like adding the variances together.)
– Keep going.
Multiple Regression
• At some point you can stop and say that if
you have all of these data on predictor
variables then you can do a good job of
predicting the dependent variable. You
confidence about this would be based on the
amount of variance the model explains—in
other words how good the model is in
predicting the dependent variable.
Multiple Regression
This is not easy to do.
• The calculations aren’t that difficult.
EZAnalyze or even Excel can do this.
• The problem is that there are a bunch of
underlying assumptions that have to be true.
–
–
–
–
–
Linear relationships
Large sample sizes (30 per variable)
Interval data
Recoded dummy variables if categorical
Strength of importance in the model (Beta)
Multiple Regression
This is not easy to do.
• Multiple regression is a powerful tool.
• Don’t expect to be able to figure this out by
yourself the first time.
• Find someone to help.
• Be prepared for it not to do what you
expected.
Back to Linear Regression
• It is easy to do. If you can do a correlation
you can do regression (you can set up a
model to predict one variable from another).
• How would you use this in your classroom
or school?
Interrater Reliability
Why Do This?
• The Right Thing To Do
• Garbage In—Garbage Out
• Fairness, Accuracy, Consistency, and the
Avoidance of Bias (NCATE Standard 2)
Consistency
• Assessments are consistent when they
produce dependable results or results that
would remain constant on repeated trials.
Essentially, in approaching consistency, the
standards are requiring that the assessments
and results be trustworthy.
(Assessing the Assessments, NCATE)
Measurement Design Issues
• A reminder: assessments can be both
consistent and wrong.
• What do you want to know?
• Assessment elements that are connected to
the standards.
• Instrument response categories that define
the range that is possible within an element.
• Response categories that match the analysis
you want to conduct.
Response Categories
• Simply providing a descriptor of the
response categories is not enough
(e.g., beginning, developing, advanced).
• Rubrics
– Descriptors of each criterion at each
response level.
Rubrics
• Rubrics contain a large amount of text.
• Train to develop measurement precision.
– Develop common understanding of terms
• Train for consistent instrument use.
• Train to eliminate rater bias.
Rater Effects
• Confirmatory Bias: Halo/Pitchfork
– Without knowing the individual you anticipate
performance.
• Contrast/Carry Over Effects
– Having seen a previous performance you assume
this observation will be similar.
• Leniency; Severity; Central Tendency Effect
– Rating is done with preconceived notions of how
the group should be rated.
Rater Effects
• Similar-to-Me
– Rating is not done from the rubric but from a
comparison to you.
• First Impression/Cosmetics
– Pretty is better.
• Rater Drift
– You lose interest or you start getting tough.
Interrater Reliability
• Analyzing ordinal data.
– What is the correlation among rankings of
common items? (2 raters: Spearman;
3+ raters: Spearman-Brown correction)
– What is the correlation among raters of common
items? (Intraclass Correlation)
– What percent of raters are ±1 of the “normed”
score? (Percent Agreement)
Percent Agreement
Percent of ratings ± 1 of mode of responses
17 Raters
100.00
Teacher Work Sample Evaluation
90.00
Before
Collaborative
Training During Training
n = 16
n = 16
80.00
70.00
60.00
50.00
40.00
30.00
20.00
10.00
0.00
El1
El2
El3
El4
Independent
End of Training
n = 17
TWS Element I
73.33
80.00
100.00
TWS Element II
60.00
93.33
100.00
TWS Element III
66.67
100.00
93.75
TWS Element IV
86.67
93.33
100.00
Designing Training
• Select element rubrics for training.
• Select work samples that represent different
levels of ability.
• Faculty norming of elements as pilot before
training.
• Data gathering during training.
• Look for qualitative and quantitative
analysis of experience.
Interrater Reliability
• It isn’t difficult to get this to work but it
takes time.
• If you do it you will get better data: that
means better quality and more defensible.