Preliminary Data Analysis

Download Report

Transcript Preliminary Data Analysis

Introduction to Data Analysis

Why do we analyze data?


Make sense of data we have collected
Basic steps in preliminary data analysis



Editing
Coding
Tabulating
Introduction to Data Analysis

Editing of data

Impose minimal quality standards on the raw data

Field Edit -- preliminary edit, used to detect glaring
omissions and inaccuracies (often involves respondent
follow up)





Completeness
Legibility
Comprehensibility
Consistency
Uniformity
Introduction to Data Analysis

Central office edit

More complete and exacting edit


Best performed by a number of editors, each looking at
one part of the data
Decisions on how to handle item non-response and
other omissions need to be made

List-wise deletion (drop for all analyses) vs. case-wise
deletion (drop only for present analysis)
Introduction to Data Analysis

Coding -- transforming raw data into symbols
(usually numbers) for tabulating, counting,
and analyzing

Must determine categories




Completely exhaustive
Mutually exclusive
Assign numbers to categories
Make sure to code an ID number for each
completed instrument
Introduction to Data Analysis

Tabulation -- counting the number of cases
that fall into each category


Initial tabulations should be preformed for each
item
One-way tabulations




Determines degree of item non-response
Locates errors
Locates outliers
Determines the data distribution
Preliminary Data Analysis

Tabulation


Simple Counts
For example



74 families in the study
own 1 car
2 families own 3
Missing data (9)


1 Family did not report
Not useful for further
analysis
Number of
Cars
1
Number of
Families
75
2
23
3
9
2
1
Total
101
Preliminary Data Analysis

Tabulation



Compute Percentages
Eliminate non-responses
Note – Report without
missing data
Number of
Cars
1
Number of
Families
75%
2
23%
3
Total
2%
100
Preliminary Data Analysis

Cross Tabulation

Simultaneous count of two
or more items


Note marginal totals are
equal to frequency totals
Allows researcher to
determine if a relationship
exists between two
variables


Used a final analysis step in
majority of real-world
applications
Investigates the relationship
between two ordinal-scaled
variables
Number
of Cars
Lower
Income
Higher
Income
1
48
27
75
2 or
More
6
19
25
Total 54
46
Total
100
Preliminary Data Analysis

Cross Tabulation

To analyze the data


Calculate percentages in
the direction of the
“causal variable”
Does number of cars
“cause” income level?
Lower
Income
Higher
Income
Total
1
64%
36%
100%
2 or
More
24%
76%
100%
Total 54%
46%
100%
Num
ber
of
Cars
Preliminary Data Analysis

Cross Tabulation

To analyze the data



Does income level
“cause” number of cars?
Seem like this is the
case.
In the direction of
income – thus, income
marginal totals should be
100%
Lower
Income
Higher
Income
1
89%
59%
75%
2 or
More
11%
41%
25%
Num
ber
of
Cars
Total
Total 100% 100% 100%
Preliminary Data Analysis

Cross Tabulation allows the development of
hypotheses

Develop by comparing percentages across



Lower income more likely to have one car (89%) than
the higher income group (59%)
Higher income more likely to have multiple cars (41%)
than the lower income group (11%)
Are results statistically significant?

To test must employ chi-square analysis
Preliminary Data Analysis

Chi-square analysis

Tests the hypothesis that two or more nominallyscaled variables are NOT independent



Null hypothesis (HO) is that the variables are
independent (i.e., no relationship exists)
Alternative hypothesis (HA) is that a statistical
relationship exists among the variables
Present example


HO: Income level will have no affect on the number of
cars that a family owns
HA: Income level will affect the number of cars that a
family owns
Preliminary Data Analysis

Chi-square analysis

General Approach



Based on “marginal
totals” compute the
expected values per cell
Compare expected
values to actual values
to compute chi-square
value (C2)
Compare computed C2
to critical C2

Table 4 on p. 442 in
text
Num
ber
of
Cars
Lower
Income
Higher
Income
Total
1
75
2 or
More
25
Total 54
46
100
Preliminary Data Analysis

Chi-square analysis

Compute Expected
Values




E1 = (75 * 54)/100
E1 = 40.5
E2 = (75 * 46)/100
E2 = 34.5

Note E1 + E2 = 75

E3 = ?
E4 = ?

Lower
Income
Higher
Income
1
E1
E2
75
2 or
More
E3
E4
25
Total 54
46
100
Num
ber
of
Cars
Total
Preliminary Data Analysis







Compute C2 value
Cell Oi
C2 = S (Oi – Ei)2/Ei
Computed C2 = 12.08
E1
df = (rows - 1) x (cols. - 1) =
1 x 1 =1
a = .05
Critical C2 = 3.84
12.08 > 3.84: Reject the
Null Hypothesis (reject if
Computed > Critical)
Ei
Oi - Ei
(Oi – Ei)2
(Oi –
Ei)2/Ei
48
40.5
7.5
56.25
1.39
E2
27
34.5
-7.5
56.25
1.63
E3
6
13.5
-7.5
56.25
4.17
E4
19
11.5
7.5
56.25
4.89
C2
12.08
S
Preliminary Data Analysis

Conclusion

Income has an influence on number of cars in a
family