6QuantiativeDataAnalysis-CentralTendency_Dispersion
Download
Report
Transcript 6QuantiativeDataAnalysis-CentralTendency_Dispersion
Introduction to Quantitative Data
Analysis (continued)
Reading on Quantitative Data Analysis: Baxter and Babbie,
2004, Chapter 11.
Course website:
http://www.sfu.ca/cmns/faculty/marontate_j/260/07-spring/
Audio recordings of Thursday lectures available on-line (for
students registered in the course) at
www.sfu.ca/lectures
Last Day: Beginning of Quantitative Data
Analysis
Introduction to Common Ways of Presenting
Statistics & Importance for Analysis
(descriptive statistics)
Tables
Charts
Graphs
Univariate Statistics
Measures
of Central Tendancy
Measures of Dispersion
Discrete & Continuous Variables
Continuous
Variable
can take infinite (or large) number of values
within range
Ex.
Age measured by exact date of birth
Discrete
Attributes
of variable that are distinct but not
necessarily continuous
Ex.
Age measured by age groups (Note: techniques exist
for making assumptions about discrete variables in order to
use techniques developed for continuous variables)
The Lexis Diagram
Isochron:
observation in 1968
Age
Life line:
cohort born in
1948
80
60
40
Age at year of
observation: 20
20
0
1890
1910
1930
1950
1970
1990
2010
Period
Core Notions in Basic Univariate
Statistics
Ways
of describing data about one
variable (“uni”=one)
Measures
of central tendency
Summarize
information about one variable
(“averages”)
Measures
of dispersion
Variations
or “spread”
Mode
most common or frequently occurring
category or value (for all types of data)
Babbie (1995: 378)
Bimodal
When there are two “most common” values that
are almost the same (or the same)
Median
middle point of rank-ordered list of all values
(only for ordinal, interval or ratio data)
Babbie (1995: 378)
Mean (arithmetic mean)
Arithmetic
“average” = sum of values divided by
number of cases (only for ratio and interval data)
Babbie (1995: 378)
Two Data Sets with the Same Mean
Another Diagram of Normal Curve
(Showing Ideal Random Sampling
Distribution, Standard Deviation & Zscores)
Normal Distribution & Measures of
Central Tendency
Symmetric
Also called the “Bell Curve”
Neuman (2000: 319)
Skewed Distributions &
Measures of Central Tendency
Skewed to the left
Skewed to the right
Neuman (2000: 319)
Why Measures of Central Tendency
are not enough to describe
distributions
7 people at bus stop in front of bar aged
25,26,27,30,33,34,35
median=
7 people in front of ice-cream parlour aged
5,10,20,30,40,50,55
median=
30, mean= 30
30, mean= 30
BUT issue of “spread” socially significant
Another Illustration Normal &
Skewed Distributions
Measures of Variation or Dispersion
range: distance between largest and smallest
scores
standard deviation: for comparing distributions
percentiles: % up to and including the number
(from below)
z-scores: for comparing individual scores taking
into account the context of different distributions
Range & Interquartile range
distance between largest and smallest scores
what
does a short distance between the scores tell us
about the sample?
But problems of “outliers” or extreme values may occur
Interquartile range (IQR)
distance between the 75th percentile and the 25th
percentile
range of the middle 50% (approximately) of the data
Eliminates problem of outliers or extreme values
Example from StatCan website (11 in sample)
Data set: 6, 47, 49, 15, 43, 41, 7, 39, 43, 41, 36
Ordered data set:6, 7, 15, 36, 39, 41, 41, 43, 43, 47, 49
Median:41
Upper quartile: 41
Lower quartile: 15
IQR= 41-15
Standard Deviation and Variance
Inter quartile range eliminates problem of
outliers BUT eliminates half the data
Solution? measure variability from the center of
the distribution.
standard deviation & variance measure how far
on average scores deviate or differ from the
mean.
Calculation of Standard
Deviation
1
2
13
4
5
6
7
8
Neuman (2000: 321)
Calculation of Standard
Deviation
Neuman (2000: 321)
Standard Deviation Formula
Neuman (2000: 321)
Details on the Calculation of Standard Deviation
Neuman (2000: 321)
Discussion The Bell Curve &
standard deviation
Discussion of Preceding Diagram
“Many biological, psychological and social phenomena
occur in the population in the distribution we call the
bell curve (Portney & Watkins, 2000).” link to source
Preceding picture
a
symmetrical bell curve,
average score [i.e., the mean] in the middle, where the ‘bell’
shape tallest.
Most of the people [i.e., 68% of them, or 34% + 34%] have
performance within 1 segment [i.e., a standard deviation] of
the average score.”
Interpreting
Standard Deviation
amount of variation
from mean
Illustration: high &
low standard
deviation
meaning depends on
exact case
Recall: Central Tendency & Dispersion
(description of distributions)
7 people at bus stop in front of bar aged
25,26,27,30,33,34,35
median=
30, mean= 30
Range= 10, standard deviation=10.5
7 people in front of ice-cream parlour aged
5,10,20,30,40,50,55
median=
30, mean= 30
Range= 50, standard deviation=17.9
Other ways of characterizing
dispersion or spread
Techniques for understanding position of a case
(or group of cases) in the context all of cases
Percentiles
Standard Scores
z-scores
Percentile
1st Calculate rank then choose a rank (score) and figure
out percentage equal to or less than the rank (score)
Link
to more complex definition of percentile
% up to and including the number (from below)
“A
percentile rank is typically defined as the proportion of scores
in a distribution that a specific score is greater than or equal to.
For instance, if you received a score of 95 on a math test and this
score was greater than or equal to the scores of 88% of the
students taking the test, then your percentile rank would be 88.
You would be in the 88th percentile”
Also used in other ways (for example to eliminate cases)
z-scores
For understanding how a score is positioned in the
data set
to enable comparisons with other scores from other
data sets
(comparing
example
individual scores in different distributions)
of two students from different schools with different
GPAs
comparing
sample distributions to population. How
representative is sample to population under study?
(Link to more complete discussion of use of z-scores to
understand sampling distribution)
Calculating Z-Scores
z-score=(score – sample mean)/standard
deviation of set
Link
to formula
Link to z-score calculator
Calculating
Z-Scores (p.
265
textbook)
Using Z-scores to compare two
students’ from different schools: A
Susan with GPA of 3.62 and Jorge with GPA of
3.64
Susan from College A
Susan’s
Grade Point Average =3.62
Mean GPA= 2.62
SD= .50
Susan’s z-score= 3.62-2.62=1.00/.50=2
Susan’s grade is two Standard deviations above mean
at her school
Using Z-scores to compare two
students’ from different schools: B
Jorge from College B
Jorge’s
GPA =3.64
Mean GPA= 3.24
SD=.40
Jorge’s z-score= 3.64-3.24=.40/.40=1
Jorge’s grade is one standard deviation above the
mean at his school
Susan’s absolute grade is lower but her position
relative to other students at her school is much
higher than Jorge’s position at his school
Another Diagram of Normal Curve
with Standard Deviation & Z-scores
Discussion of Previous Case
Relationship of sampling distribution to
population (use mean of sample to estimate
mean of population)
Recall: Results with two Variables-Bivariate Statistics
Statistical relationships between two variables
Covariation
(vary together)
a
type of association
Not necessarily causal
Independence
(Null hypothesis): no relationship
between the two variables
Cases
with values in one variable do not have any particular
value on the other variable
Sample Mean Notation
Population Mean Notation
Standard Error (recall tutorial task
about average ages in family)
Calculate mean for all possible samples
Divide by number of samples
Measures variability
Recall: Results with two Variables-Bivariate Tables (Cross Tabulations)
Singleton, R., Straits, B. & Straits, M. (1993)
Approaches to social research. Toronto: Oxford
Interpretation issues (Bivariate
Tables)
Calculate percentages within categories of
attributes of independent variable
In example:
Independent
variable: gender
Dependent variable: fear of walking alone at night
Women more afraid than men
Other Ways of Presenting Same
Data
Link to other tables
Calculating Expected Outcomes
If variables (gender & fear) not related then distribution
of subgroups of independent variable (male & female)
should be the same in each subgroup as in the group
overall (therefore men and women should express fear in
the same proportions)
Used in techniques for studying relationships (Chi-square)
Descriptive dimension (strength of relationship)
Inferential (probability that the association is due to chance)
Expected outcomes (Null
Hypothesis)
Singleton, R., Straits, B. & Straits, M. (1993)
Approaches to social research. Toronto: Oxford
Next Day
Control variables: Trivariate Tables
Men/Women Drivers
In, Say it with Figures, Hans Zeisel presents the following data:
Automobile Accidents by Sex
-----------------------------------------Per Cent
Accident Free
Women
Men
68%
(6,950)
56%
(7,080)
------------------------------------------
Automobile Accidents by Sex and Distance Driven
---------------------------------------------------------------------------Distance
Under 10,000 km
Over 10,000 km
Per Cent
Per Cent
Accident Free
Accident Free
Women
Men
75%
(5,035)
75%
(2,070)
48%
(1,915)
48%
(5,010)
----------------------------------------------------------------------------
Women have fewer accidents than men because women tend
to drive less frequently than do men, and people who drive
less frequently tend to have fewer accidents