Transcript Lecture 8
LIS 570
Summarising and presenting data Univariate analysis continued
Bivariate analysis
Selecting analysis and statistical techniques
Specific research question or hypothesis
Determine number of variables
Type title here
Univariate analysis
Bivariate analysis
Multivariate analysis
Determine level of measurement of variables
Choose univariate method of analysis
Choose relevant descriptive statistics
Choose relevant inferential statistics
De Vaus p133
Methods of analysis (De Vaus, 134)
Univariate
methods
Bivariate
methods
Multivariate
methods
Frequency distributions
Cross tabulations
Conditional tables
Scattergrams
Partial rank order
correlation
Regression
Multiple and partial
correlation
Correlation
Multiple and partial
regression
Comparison of means
Path analysis
Summary
Inferential statistics for univariate analysis
Bivariate analysis
crosstabulation
the character of relationships - strength,
direction, nature
correlation
Inferential statistics - univariate
analysis
Interval estimates - interval variables
estimating how accurate the sample mean is
based on random sampling and probability
theory
Standard error of the mean (Sm)
Standard deviation
Sm =
s
N
Total number in
the sample
Standard Error
Probability theory
for 95% of samples, the population mean will be
within + or - two standard error units of the
sample mean
this range is called the confidence interval
standard error is a function of sample size
to reduce the confidence interval, increase the sample
size
Inference for non-interval
variables
For nominal and ordinal data
Variable must have only two categories
may have to combine categories to achieve this
P = the % in one category of the variable
SB = PQ
Q = the % in the other category of the variable
N
Total number in the sample
Standard error
for binominal distribution
Association
Example: gender and voting
Are gender and party supported associated
(related)?
Are gender and party supported independent
(unrelated)?
Are women more likely to vote Republican?
Are men more likely to vote Democrat?
Association
Association in bivariate data means that
certain values of one variable tend to occur
more often with some values of the second
variable than with other variables
of that variable (Moore p.242)
Cross Tabulation Tables
Designate the X variable and the Y variable
Place the values of X across the table
Draw a column for each X value
Place the values of Y down the table
Draw a row for each Y value
Insert frequencies into each CELL
Compute totals (MARGINALS) for each column
and row
Determining if a Relationship
Exists
Compute percentages for each value of X (down
each column)
Base = marginal for each column
Read the table by comparing values of X for each
value of Y
Read table across each row
Terminology
strong/ weak; positive/ negative; linear/ curvilinear
Cross tabulation tables
Occupation
White collar
Blue collar
Total
Labor
Freq
270
%
27%
Freq %
810 81%
1080
Other
730
73%
190
920
Totals
1000
100%
1000 100%
Read
Table
19%
Calculate
percent
2000
(De Vaus pp 158-160)
Cross tabulation
Use column percentages and compare these
across the table
Where there is a difference this indicates
some association
Describing association
Strong - Weak
Direction
Strength
Positive - Negative
Nature
Linear - Curvilinear
Describing association
Two variables are positively associated when
larger values of one tend to be accompanied
by larger values of the other
The variables are negatively associated when
larger values of one tend to be accompanied
by smaller values of the other
(Moore, p. 254)
Describing association
Scattergram
a graph that can be used to show how two
interval level variables are related to one another
Variable N
Shoe
size
Age
Variable M
Description of Scattergrams
Strength of Relationship
Linearity of Relationship
Strong
Moderate
Low
Linear
Curvilinear
Direction
Positive
Negative
Description of scatterplots
Y
Y
X
Y
X
Strength and direction
Y
X
X
Description of scatterplots
Y
Nature
X
Y
Y
Strength and direction
X
Y
X
X
Correlation
Correlation coefficient
number used to describe the strength and
direction of association between variables
Very strong = .80 through 1
Moderately strong = .60 through .79
Moderate = .50 through .59
Moderately weak = .30 through .49
Very weak to no relationship 0 to .29
-1.00
Perfect Negative
Correlation
0.00
No relationship
1.00
Perfect Positive
Correlation
Correlation Coefficients
Nominal
Phi (Spss Crosstabs)
Cramer’s V (Spss Crosstabs)
Ordinal (linear)
Gamma (Spss Crosstabs)
Nominal and Interval
Eta (Spss Crosstabs)
Correlation: Pearson’s r
(SPSS correlate, bivariate)
Interval and/or ratio variables
Pearson product moment coefficient (r)
two interval normally distributed variables
assumes a linear relationship
Can be any number from
0 to -1 : 0 to 1 (+1)
Sign (+ or -) shows direction
Number shows strength
Linearity cannot be determined from the coefficient
r=
.8913
Summary
Bivariate analysis
crosstabulation
X - columns
Y - rows
calculate percentages for columns
read percentages across the rows to observe association
Correlation and scattergram
describe strength and direction of association