Quantitative Methods

Download Report

Transcript Quantitative Methods

Quantitative Methods
Partly based on materials by Sherry O’Sullivan
Part 3
Chi - Squared Statistic
Recap on T-Statistic
• It used the mean and standard error of a
population sample
• The data is on an “interval” or scale
• Mean and standard error are the parameters
• This approach is known as parametric
• Another approach is non-parametric testing
Introduction to Chi-Squared
• It does not use the mean and standard error of a population
sample
• Each respondent can only choose one category (unlike scale in
t-Statistic)
• The expected frequency must be greater than 5 in each
category for the test to succeed.
• If any of the categories have less than 5 for the expected
frequency, then you need to increase your sample size
– Or merge categories
Example using Chi-Squared
• “Is there a preference amongst the UW
student population for a particular web
browser? “ (Dr C Price’s Data)
– They could only indicate one choice
– These are the observed frequencies responses
from the sample
– This is called a ‘contingency table’
Firefox
Observed
30
frequencies
IExplorer
Safari
Chrome
Opera
6
4
8
2
Was it just chance?
• How confident am I?
– Was the sample representative of all UW
students?
– Was the variation in the measurements just
chance?
• Chi-Squared test for significance
– Several ways to use the test
– Simplest is Null Hypothesis
• H0: The students show “no preference” for a
particular browser
Chi-Squared: “Goodness of fit”
(No preference)
• H0: The students show “no preference”
for a particular browser
• This leads to Hypothetical or Expected
distribution of frequency
– We would expect an equal number of
respondents per category
– We had 50 respondents and 5 categories
Expected
frequencies
Firefox
IExplorer
Safari
Chrome
Opera
10
10
10
10
10
Expected frequency table
Stage1: Formulation of Hypothesis
• H0: There is no preference in the underlying
population for the factor suggested.
• H1: There is a preference in the underlying
population for the factors suggested.
• The basis of the chi-squared test is to
compare the observed frequencies against
the expected frequencies
Stage 2: Expected Distribution
• As our “null- hypothesis” is no preference,
we need to work out the expected
frequency:
– You would expect each category to have the
same amount of respondents
– Show this in “Expected frequency” table
– Each expected frequency must be more than
5 to be valid
Expected
frequencies
Firefox
IExplorer
Safari
Chrome
Opera
10
10
10
10
10
Stage 3a: Level of confidence
• Choose the level of confidence (often 0.05;
sometimes 0.01)
– 0.05 means that there is 5% chance that
conclusion is chance
– 95% chance that our conclusions are accurate
Stage 3b: Degree of freedom
We need to find the degree of freedom
 This is calculated with the number of
categories

◦ We had 5 categories, df = 5-1 (4)
Stage 3b: Critical value of ChiSquared
• In order to compare our calculated chisquare value with the “critical value” in the
chi-squared table we need:
– Level of confidence (0.05)
– Degree of freedom (4)
• Our critical value from the table = 9.49
Chi-Squared Table from
http://ourwayit.com/CA517/LearningActivities.htm
Stage 4: Calculate statistics
• We find the differences between the observed
and the expected values for each category
• We square each difference, and divide the
answer by its expected frequency
• We add all of them up
Firefox
IExplorer
Safari
Chrome
Opera
Observed
30
6
4
8
2
Expected
10
10
10
10
10
= 52
Stage 5: Decision
• Can we reject the H0 that students show no
preference for a particular browser?
– Our value of 52 is way beyond 9.49. We are (at least)
95% confident the value did not occur by chance
– And probably much more confident than that
• So yes we can safely reject the null hypothesis
• Which browser do they prefer?
– Firefox as it is way above expected frequency of 10
Alternative Method
• Outline: Calculate chi-squared, and use the
table to find the confidence
• In this case, calculated Χ2 = 52
• Go to the appropriate row of the table, and
look across for the highest value that is
LOWER than the measured value
• The top of that column gives our confidence
that the effect is real
Chi-Squared Table from
http://ourwayit.com/CA517/LearningActivities.htm
•The probability of this result happening by chance is less
than 0.001
•We can be at least 99.9% confident of our result
Chi-Squared: “No Difference from a
Comparison Population”.
• RQ: Are drivers of high performance cars
more likely to be involved in accidents?
– Sample n = 50 and Market Research data of
proportion of people driving these categories
FO = observed
accident
frequency
Ownership (%)
High
Compact Midsize
Performance
20
14
9
Full
size
9
10%
20%
40%
30%
Contingency Table
– Null hypothesis H0: type of car has no effect on
accident frequency
– Once the expected frequencies (under the null
hypothesis) have been calculated, the analysis is the
same as the ‘no preference’ calculation
High
Compact Midsize Full
Performance
size
FO = observed accident
frequency
20
14
Ownership (%)
FE = expected accident
frequency
10%
40%
5 (10% of 50) 20
9
9
30%
15
20%
10
Chi-Squared test for “Independence”.
• What makes computer games fun?
• Review found the following
– Factors (Mastery, Challenge and Fantasy)
– Is there a different opinion depending on
gender?
• Research sample of 50 males and 50 females
Mastery
Challenge
Fantasy
Male
10
32
8
Female
24
8
18
Observed frequency table
What is the research question?
1. A single sample with individuals
measured on 2 variables
– RQ: ”Is there a relationship between fun factor
and gender?”
– HO : “There is no such relationship”
2. Two separate samples representing 2
populations (male and female)
– RQ: ““Do male and female players have different
preferences for fun factors?”
– HO : “Male and female players do not have
different preferences”
Chi-Squared analysis for
“Independence”.
• Establish the null hypothesis (previous slide)
• Determine the critical value of chi-squared
dependent on the confidence limit (0.05) and
the degrees of freedom.
– df = (Rows – 1)*(Columns – 1) = 1 * 2 = 2 (R=2, C=3)
Mastery
Challenge
Fantasy
Male
10
32
8
Female
24
8
18
• Look up in chi-squared table
– Critical chi-squared value = 5.99
Chi-Squared Table from
http://ourwayit.com/CA517/LearningActivities.htm
Chi-Squared analysis for
“Independence”.
• Calculate the expected frequencies
– Add each column and divide by types (in this case 2)
– Easier if you have equal number for each gender (if
not come and see me)
Mastery
Challenge
Fantasy
Respondents
Male (FObs)
10
32
8
50
Female (FObs)
24
8
18
50
Cat total
34
40
26
Male (FExp)
17
20
13
Female (FExp)
17
20
13
Chi-Squared analysis for
“Independence”.
• Calculate the statistics using the chi-squared
formula
– Ensure you include both male and female data
2
2
2
2
(10

17)
(32

20)
(24

17)
(8

20)
2 

 ... 

17
20
17
20
 24.01
Mastery
Challenge
Fantasy
Male (FObs)
10
32
8
Female (FObs)
24
8
18
Male (FExp)
17
20
13
Female (FExp)
17
20
13
Stage 5: Decision
• Can we reject the null hypothesis?
– Our value of 24.01 is way beyond 5.99. We are 95%
confident the value did not occur by chance
• Conclusion: We are 95% confident that there is a
relationship between gender and fun factor
• But else can we get from this?
– Significant fun factor for males = Challenge
– Significant fun factor for females = Mastery and Fantasy
Mastery
Challenge
Fantasy
Male (FObs)
10
32
8
Female (FObs)
24
8
18
Male (FExp)
17
20
13
Female (FExp)
17
20
13
Alternative Method:
• Outline: Calculate chi-squared, and use the
table to find the confidence
• In this case, calculated Χ2 = 24.01
• Go to the appropriate row of the table, and
look across for the highest value that is
LOWER than the measured value
• The top of that column gives our confidence
that the effect is real
Chi-Squared Table from
http://ourwayit.com/CA517/LearningActivities.htm
•The probability of this result happening by chance is less
than 0.001
•We can be at least 99.9% confident of our result
Computers
• A computer can be used to calculate the
expected values – but you have to tell it how
– Use formulae in Excel
• Then the computer will calculate the p value
for you
– p = probability that the observed difference is due
to chance
– There is a nice command in Excel that will do this
End