Intro_to_research_7_.. - University of Washington

Download Report

Transcript Intro_to_research_7_.. - University of Washington

The Information School of the University of Washington
LIS 570
Session 7.1
Bivariate Data Analysis
The Information School of the University of Washington
Objectives
• Reinforce concept of standard error and the
standard normal distribution (basis of
confidence level and confidence interval)
• Understand different approaches to the
analysis of bivariate data
• Gain confidence in use of SPSS
LIS 570
Univariate Analysis
Mason; p. 2
The Information School of the University of Washington
Agenda
• Review Central Limit Theorem
• Visualization of “confidence interval” and
“confidence level”
• Overview of bivariate analysis approaches
• Exploratory data analysis using SPSS
LIS 570
Univariate Analysis
Mason; p. 3
symmetrical
Normal distribution:
symmetrical Bell-shaped
curve
asymmetrical
The Information School of the University of Washington
Shapes of distribution
Positively skewed:
tail on the right, cluster towards low
end of the variable
Bimodality: A double peak
Negatively skewed:
tail on the left, cluster towards highend of the variable
LIS 570
Univariate Analysis
Mason; p. 4
The Information School of the University of Washington
Central Limit Theorem
The CLT states: regardless of the shape of the
population distribution, as the number of
samples (N) becomes very large
(approaches infinity) the distribution of the
sample mean ( m ) is normally distributed,
with a mean of µ and standard deviation of
σ/(√N).
LIS 570
Univariate Analysis
Mason; p. 5
The Information School of the University of Washington
Standard Error of the Mean
Standard error of the mean (Sm)
Sm =
–
–
–
–
S
S
Standard deviation
Total number in the sample
N
Standard error is inversely related to square root of sample
size
To reduce standard error, increase sample size
Standard error is directly related to standard deviation
When N = 1, standard error is equal to standard deviation
LIS 570
Univariate Analysis
Mason; p. 6
The Information School of the University of Washington
Inferential statistics - univariate analysis
Interval estimates and interval variables
• Estimation of sample mean accuracy—based on
random sampling and probability theory
Standardize the sample mean to estimate population
mean:
t = sample mean – population mean
estimated SE
Population mean = sample mean + t * (estimated SE)
LIS 570
Univariate Analysis
Mason; p. 7
The Information School of the University of Washington
Exercise—sampling distribution
•
•
•
•
Coin tossing
Probability of head or tails—50%
Each of you is a “sample” for this activity.
Flip the coin 9 times, count the # of times
you get a “head”.
Live demo:
http://www.ruf.rice.edu/~lane/stat_sim/sampling
_dist/index.html
LIS 570
Univariate Analysis
Mason; p. 8
The Information School of the University of Washington
Standard Error
(for nominal & ordinal data)
Variable must have only two categories
(could combine categories to achieve this)
P = the % in one category of the variable
SB =
PQ
N
Q = the % in the other category of the variable
Total number in the sample
Standard error
for binominal distribution
LIS 570
Univariate Analysis
Mason; p. 9
The Information School of the University of Washington
Choosing the Statistical Technique*
Specific research question or hypothesis
Determine # of variables in question
Univariate analysis
Bivariate analysis
Multivariate analysis
Determine level of
measurement of variables
Choose univariate
method of analysis
Choose relevant
descriptive statistics
Choose relevant
inferential statistics
LIS 570
Univariate Analysis
* Source: De Vaus, D.A. (1991) Surveys in Social Research.
Third edition. North Sydney, Australia: Allen & Unwin Pty
Ltd., p133
Mason; p. 10
The Information School of the University of Washington
Methods of analysis (De Vaus, 134)
Univariate
methods
Bivariate
methods
Frequency distributions
Cross tabulations
Scattergrams
Regression
Correlation
Comparison of means
LIS 570
Univariate Analysis
Mason; p. 11
The Information School of the University of Washington
Association
• Example: gender and voting
– Are gender and party supported associated
(related)?
– Are gender and party supported independent
(unrelated)?
– Are women more likely than men to vote
republican?
Are men more likely to vote democrat?
LIS 570
Univariate Analysis
Mason; p. 12
The Information School of the University of Washington
Association
Association in bivariate data means that
certain values of one variable tend to occur
more often with some values of the second
variable than with other variables
of that variable (Moore p.242)
LIS 570
Univariate Analysis
Mason; p. 13
The Information School of the University of Washington
Cross Tabulation Tables
•
•
•
•
•
•
•
Designate the X variable and the Y variable
Place the values of X across the table
Draw a column for each X value
Place the values of Y down the table
Draw a row for each Y value
Insert frequencies into each CELL
Compute totals (MARGINALS) for each column and
row
LIS 570
Univariate Analysis
Mason; p. 14
The Information School of the University of Washington
Determining if a Relationship
Exists
• Compute percentages for each value of X (down
each column)
– Base = marginal for each column
• Read the table by comparing values of X for each
value of Y
– Read table across each row
• Terminology
– strong/ weak; positive/ negative; linear/ curvilinear
LIS 570
Univariate Analysis
Mason; p. 15
The Information School of the University of Washington
Cross tabulation tables
Occupation
Democrat
White collar Blue collar
Total
Freq
270
Republican 730
Totals
Read
Table
1000
%
27%
Freq %
810 81%
1080
73%
190
920
19%
100% 1000 100%
LIS 570
Univariate Analysis
Calculate
percent
2000
(De Vaus ppMason;
158-160)p. 16
The Information School of the University of Washington
Cross tabulation
• Use column percentages and compare these
across the table
• Where there is a difference this indicates
some association
LIS 570
Univariate Analysis
Mason; p. 17
The Information School of the University of Washington
Describing association
Strong - Weak
Direction
Strength
Positive - Negative
Nature
Linear - Curvilinear
LIS 570
Univariate Analysis
Mason; p. 18
The Information School of the University of Washington
Describing association
Two variables are positively associated when
larger values of one tend to be
accompanied by larger values of the other
The variables are negatively associated
when larger values of one tend to be
accompanied by smaller values of the other
(Moore, p. 254)
LIS 570
Univariate Analysis
Mason; p. 19
The Information School of the University of Washington
Describing association
Scattergram or scatterplot
Graph that can be used to show how two interval
level variables are related to one another
Variable A
weight
Age
Variable B
LIS 570
Univariate Analysis
Mason; p. 20
The Information School of the University of Washington
Description of Scattergrams
– Strength of Relationship
• Strong
• Moderate
• Low
– Linearity of Relationship
• Linear
• Curvilinear
– Direction
• Positive
• Negative
LIS 570
Univariate Analysis
Mason; p. 21
The Information School of the University of Washington
Description of scatterplots
Y
Y
X
Y
X
Strength and direction
Y
X
LIS 570
Univariate Analysis
X
Mason; p. 22
The Information School of the University of Washington
Description of scatterplots
Y
Y
Nature
X
Strength and direction
X
Y
Y
X
LIS 570
Univariate Analysis
X
Mason; p. 23
The Information School of the University of Washington
Correlation
• Correlation coefficient—number used to
describe the strength and direction of
association between variables
•
•
•
•
Very strong = .80 through 1
Moderately strong = .60 through .79
Moderate = .50 through .59
Moderately weak = .30 through .49
• Very weak to no relationship 0 to .29
-1.00
Perfect Negative
Correlation
0.00
No
relationship
LIS 570
Univariate Analysis
1.00
Perfect Positive
Correlation
Mason; p. 24
The Information School of the University of Washington
Correlation Coefficients
– Nominal
• Phi
• Cramer’s V
– Ordinal (linear)
• Gamma
– Nominal and Interval
• Eta
http://www.nyu.edu/its/socsci/Docs/correlate.html
LIS 570
Univariate Analysis
Mason; p. 25
The Information School of the University of Washington
Correlation: Pearson’s r
– Interval and/or ratio variables
– Pearson product moment coefficient (r)
• two interval variables, normally distributed
• assumes a linear relationship
• Can be any number from
– 0 to -1 : 0 to 1 (+1)
• Sign (+ or -) shows direction
• Number shows strength
• Linearity cannot be determined from the coefficient
e.g.:
r=
.8913
LIS 570
Univariate Analysis
Mason; p. 26
The Information School of the University of Washington
Summary
• Bivariate analysis
• crosstabulation
– X - columns
– Y - rows
• calculate percentages for columns
• read percentages across the rows to observe association
• Correlation and scattergram: describe strength
and direction of association
LIS 570
Univariate Analysis
Mason; p. 27