Research Methods Lecture #9
Download
Report
Transcript Research Methods Lecture #9
Research Methods
Chapter 8
Data Analysis
Two Types of Statistics
• Descriptive
– Allows you to describe relationships between
variables
• Inferential
– Allows one to test hypotheses & see if results are
generalizable
Descriptive Statistics
• Often begins with univariate analysis
– Displays the variation of a variable
– Several ways to display variation
• Bar Chart, Frequency Polygram, Histogram, etc.
Percent of Church Membership
Rates of Church Affiliation, U.S., 1776-1995
70
60
50
40
30
20
10
0
1776 1850 1860 1870 1890 1906 1916 1926 1952 1980 1995
Year
Frequency Polygon
– 3 features of the shape of variation are important:
• Central Tendency: The most common value or the value
around which cases tend to center around
– a.k.a averages like mean, median, mode
• Variability: the degree to which cases are spread out or
clustered together
• Skewness
– The extent to which cases are clustered more at one or the
other end of a distribution
» Can be either non, positive, or negative
Negative Skew: Test to Easy
Freq.
0
Score
100
Positive Skew: Test to Hard
Freq.
0
Score
100
Frequency Distribution of Voting in
1992 Presidential Election
Value
Voted
Did not vote
Not eligible
Refused
Don’t know
No answer
Total
Frequency
1,909
762
183
10
38
2
2,904
Valid Percent
71.5%
28.5
--------100.0%
Ungroup and Grouped Age
Distributions
Ungrouped
Age
Percent
18
0.2%
19
1.2
20
1.4
21
1.3
And so on…...
Grouped
Age
Percent
18-19
1.4
20-29
19.0
30-39
24.0
40-49
21.5
Calculating The Mean
X = The Sum of Scores / # of Scores
• So if you had the following test scores (5, 10,
15, 10, 5, 10, 5, 15, 15, 10)
• What would be the mean?
• Answer: 10!
(100/10)
Calculating the Mode
• Mode = The most frequent value in a
distribution
• So if you had the following test scores: (10, 5,
10, 15, 10, 10, 5, 10, 5, 15, 15, 10)
• What would be the mean?
• Answer: 10! (There are more 10’s than any
other number)
Calculating the Median
• Median = The value in the middle of a
distribution
• Example: (22, 25, 34, 35, 41, 41, 46, 46, 46, 47,
49, 54, 54, 59, 60)
• Several Steps to calculate the Median
– Arrange all observations in order of size, from
smallest to largest
– Determine the number of values in the distribution
(N)
• N in this case = 15
– Plug N into the following formula
• (N+1)/2 = (15+1)/2 = 16/2= 8
– If you get a whole number (in this case you got an
“8”) then count up that number in the distribution
• (22, 25, 34, 35, 41, 41, 46, 46, 46, 47, 49, 54, 54, 59, 60)
• Thus, the median is “46”
• If you don’t get a whole number then you have
to add a step
• Example: 8, 13, 14, 16, 23, 26, 28, 33, 39, 61
• Find the N (In this case, the N is “10”
• (N + 1)/2 = (10+1)/2 = 5.5.
• Thus, counting up 5.5 gets you to the point
between “23” & “26”
• The extra step….
• (N1 + N2)/2 = (23 + 26)/2 = 49/2 = 24.5
• Thus, the Median in this case is 24.5
Determine the Mean, Median and
Mode
•
•
•
•
•
2, 2, 2, 2, 2
1,2,2,2,5,5,10,10,15,25
17, 18, 9, 9, 5
7, 7, 14, 3, 11, 27, 498
11, 67, 43, 2, 2, 2, 6
Answers
• 2, 2, 2, 2, 2
– Mean = 10/5 = 2
– Median =(5 + 1)/2 = 6/2 = 3 Then: count up 3
spaces to get to “2”
– Mode = 2
• 1,2,2,2,5,5,10,10,15,25
– Mean = 77/10 = 7.7
– Median = (10 + 1)/2 = 11/2 =5.5 Then:
10/2= 5
– Mode = 2
(5 + 5)/2 =
• 17, 18, 9, 9, 5
– Mean = 58/5 = 11.6
– Median = (5 + 1)/2 = 3 Then: = 9
– Mode = 9
• 7, 7, 14, 3, 11, 27, 498
– Mean = 567/7 = 81
– Median =(7 + 1)/2 = 4 Then: = 11
– Mode = 7
• 11, 67, 43, 2, 2, 2, 6
– Mean = 133/7 = 19
– Median = (7 + 1)/2 = 4 Then: = 6
– Mode = 2
Suppose You Had the Following
1 person making $45,000
1 person making $15,000
2 People making $10,000
1 Person making $5,700
3 people making $5,000
4 people making $3,700
1 person making $3,000
12 people making $2,000
What did you Get?
• Mean =
– $142,500 / 25 = $5,700
• Median =
– $3,000 (there are 12 above you and 12 below you
• Mode =
– $2,000 (occurs the most frequently)
Mean Vs. Median Vs. Mode
• Generally use the mean for interval or ratio
levels of measurement
– E.g. Fahrenheit temperatures, Age, Income
• Look at shape of distribution first, however
– If there are lot’s of outliers, the median might be
preferable
• Income if including Bill Gates
• Use the mode for nominal levels of
measurement
– Gender
Measures of Variation
• Central tendency (mean, median, mode)
although valuable, only shows us a small piece
of the picture
– Relying only on central tendency may give us an
incomplete and misleading picture
• Three towns may have the same mean and median
income but be very different in social character
– One may be mostly middle class with a few rich and many poor
– One may have an euqal number of rich, middle class, & poor
• Looking at measures of variation can help us
see past the limitations of central tendency
The Four Popular Measures of Variation
1 Range
– Calculated by taking the highest value in a
distribution and subtracting the lowest value, and
then adding 1
– Shows us the range of possible values that may be
encountered
– Weakness: The range can be drastically altered by
just one exceptionally high or low value (known as
an “outlier”).
2 Interquartile Range
– Avoids the problem created by outliers
– Quartiles are the points in a distribution
corresponding to the first 25%, the first 50%, and
the first 75% of the cases.
• The second quartile (50%) is the median
3 Variance
– The average of the squared deviations from the
mean
Variance
X
3
4
6
12
20
Total
__
X=9
__
X-X
-6
-5
-3
3
11
__
(X - X)2
36
25
9
9
121
200
X2
9
16
36
144
400
605
4 Standard Deviation
– Gives an “average distance” between all scores and
the mean
– Calculated by squaring the variance
Crosstabulation
Voting
Voted
Did not
Total
(n)
Family Income
$17,500- $35,000<$17,500 $34,999 $59,999 $60,000+
60%
73%
75%
84%
40%
27%
25%
16%
100% 100%
100%
100%
(424)
(550)
(541)
(433)
Crosstabulating Variables
• Crosstabulations reveal 4 aspects of the
association between 2 variables:
– Existence: is there a correlation?
– Strength: How strong does the correlation appear
to be?
– Direction: Positive or negative correlation?
– Pattern: Are changes in the percentage
distribution of the dependent variable fairly
regular (simply increasing or decreasing), or do
they vary?
Evaluating Association
• Inferential Stats are used to determine the
likelihood that an association exists in the
larger pop. From which the sample is drawn
• Thus, researchers often calculate probability
levels that determine the probability of chance
– E.g. p<.05 means that the probability that the
association is due to chance is less than 5 out of
100, or 5%
• Generally looking for at least .05, but some want .01 or
.001
Controlling for a Third Variable
• Associations, however, do not necessary mean
causation
• Use elaboration analysis to determine whether
an association is due to a causal relationship or
to another variable
• Three types…. Intervening, extraneous, and
specification...
Intervening Variables
Income
Perceived
Efficacy
Voting
Extraneous Variables
Income
Voting
Education
Findings
• The 3 criteria
– Time Order
•
Asked the following questions:
–
–
How long have they been attended church? Used only those who had attended
for over a year or more
Eight questions about their deviant acts WITHIN THE PAST YEAR!!
– Correlation
• The data indicated a correlation between the two variables (church
attendance and delinquency)
– Spuriousness
• Could another variable be the determining factor for delinquency
instead of church attendance? (Elaboration Analysis)
– Race
– School
– Grade
– Gender
Findings
• The hypothesis was not supported!
• The correlation between church attendance
and delinquency is spurious
– The third variable of gender appears to be an
extraneous variable