Chapter 34 - Routledge

Download Report

Transcript Chapter 34 - Routledge

APPROACHES TO
QUANTITATIVE DATA ANALYSIS
© LOUIS COHEN, LAWRENCE
MANION & KEITH MORRISON
STRUCTURE OF THE CHAPTER
•
•
•
•
•
•
•
•
•
•
•
Scales of data
Parametric and non-parametric data
Descriptive and inferential statistics
Kinds of variables
Hypotheses
One-tailed and two-tailed tests
Distributions
Statistical significance
Hypothesis testing
Effect size
A note on symbols
FOUR SCALES OF DATA
NOMINAL
ORDINAL
INTERVAL
RATIO
It is incorrect to apply statistics which can only
be used at a higher scale of data to data at a
lower scale.
PARAMETRIC AND NONPARAMETRIC STATISTICS
• Parametric statistics: where characteristics
of, or factors in, the population are known;
• Non-parametric statistics: where the
characteristics of, or factors in, the population
are unknown.
DESCRIPTIVE AND INFERENTIAL
STATISTICS
• Descriptive statistics: to summarize features of
the sample or simple responses of the sample
(e.g. frequencies or correlations).
• No attempt is made to infer or predict population
parameters.
• Inferential statistics: to infer or predict
population parameters or outcomes from simple
measures, e.g. from sampling and from
statistical techniques.
• Based on probability.
DESCRIPTIVE STATISTICS
• The mode (the score obtained by the greatest
number of people);
• The mean (the average score);
• The median (the score obtained by the middle
person in a ranked group of people, i.e. it has an
equal number of scores above it and below it);
• Minimum and maximum scores;
• The range (the distance between the highest
and the lowest scores);
• The variance (a measure of how far scores are
from the mean: the average of the squared
deviations of individual scores from the mean);
SIMPLE STATISTICS
• Frequencies (raw scores and percentages)
– Look for skewness, intensity, distributions and
spread (kurtosis);
• Mode
– For nominal and ordinal data
• Mean
– For interval and ratio data
• Standard deviation
– For interval and ratio data
9
8
Mean
7
|
6
|
5
|
4
|
3
|
2
|
1 X X X X
|
1
2
3
4
5
6
1
2
3
4
20
Mean = 6
High standard deviation
X
7
8
9
10
11
12
13
14
15
16
17
18
19
20
9
8
Mean
7
|
6
|
5
|
4
|
3
|
2
|
1 X X
X
1
2
3
4
5
6
1
2
6
10
11
Mean = 6
Moderately high
standard deviation
7
8
9
X
X
10
11
12
13
14
15
16
17
18
19
20
9
8
Mean
7
|
6
|
5
|
4
|
3
X
2
X
1
1
2
3
4
5
6
6
6
7
Mean = 6
Low standard deviation
X
X
X
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
STANDARD DEVIATION
• The standard deviation is a standardised measure
of the dispersal of the scores, i.e. how far away from
the mean/average each score is. It is calculated, in
its most simplified form as:
 d
S .D.  
 N

2




or
 d 2 

S .D.  
 N 1


• d2 = the deviation of the score from the mean
(average), squared
•  = the sum of
• N = the number of cases
• A low standard deviation indicates that the scores
cluster together, whilst a high standard deviation
indicates that the scores are widely dispersed.
DESCRIPTIVE STATISTICS
• The standard deviation (a measure of the
dispersal or range of scores: the square root of
the variance);
• The standard error (the standard deviation of
sample means);
• The skewness (how far the data are
asymmetrical in relation to a ‘normal’ curve of
distribution);
• Kurtosis (how steep or flat is the shape of a
graph or distribution of data; a measure of how
peaked a distribution is and how steep is the
slope or spread of data around the peak).
INFERENTIAL STATISTICS
•
•
•
•
•
•
•
Can use descriptive statistics.
Correlations
Regression
Multiple regression
Difference testing
Factor analysis
Structural equation modelling
DEPENDENT AND INDEPENDENT
VARIABLES
•
•
An independent variable is an antecedent
variable, that which causes, in part or in total,
a particular outcome; it is a stimulus that
influences a response, a factor which may be
modified (e.g. under experimental or other
conditions) to affect an outcome.
A dependent variable is the outcome
variable, that which is caused, in total or in
part, by the input, antecedent variable. It is
the effect, consequence of, or response to, an
independent variable.
DEPENDENT AND INDEPENDENT
VARIABLES
•
In using statistical tests which require
independent and dependent variables,
exercise caution in assuming which is or is not
the dependent or independent variable, as the
direction of causality may not be one-way or in
the direction assumed.
FIVE KEY INITIAL QUESTIONS
1. What kind (scales) of data are there?
2. Are the data parametric or non-parametric?
3. Are descriptive or inferential statistics
required?
4. Do dependent and independent variables need
to be identified?
5. Are the relationships considered to be linear or
non-linear?
CATEGORICAL, DISCRETE AND
CONTINUOUS VARIABLES
• A categorical variable is a variable which has
categories of values, e.g. the variable ‘sex’ has
two values: male and female.
• A discrete variable has a finite number of
values of the same item, with no intervals or
fractions of the value, e.g. a person cannot have
half an illness or half a mealtime.
• A continuous variable can vary in quantity, e.g.
money in the bank, monthly earnings. There
are equal intervals, and, usually, a true zero,
e.g. it is possible to have no money in the bank.
CATEGORICAL, DISCRETE AND
CONTINUOUS VARIABLES
• Categorical variables match categorical data.
• Continuous variables match interval and ratio
data.
KINDS OF ANALYSIS
• Univariate analysis: looks for differences
amongst cases within one variable.
• Bivariate analysis: looks for a relationship
between two variables.
• Multivariate analysis: looks for a relationship
between two or more variables.
HYPOTHESES
• Null hypothesis (H0)
• Alternative hypothesis (H1)
• The null hypothesis is the stronger hypothesis,
requiring rigorous evidence not to support it.
• One should commence with the former and
cast the research in the form of a null
hypothesis, and only turn to the latter in the
case of finding the null hypothesis not
supported.
HYPOTHESES
• Direction of hypothesis: states the kind of
difference or relationship between two conditions
or two groups of participants
• One-tailed (directional), e.g.: ‘people who study in
silent surroundings achieve better than those
who study in noisy surroundings’. (‘Better’
indicates the direction.)
• Two-tailed (no direction), e.g.: ‘there is a
difference between people who study in silent
surroundings and those who study in noisy
surroundings’. (There is no indication of which is
the better.)
ONE-TAILED AND TWO-TAILED TESTS
• A one-tailed test makes assumptions about the
population and the direction of the outcome,
e.g. Group A will score more highly than
another on a test.
• A two-tailed test makes no assumptions about
the population and the direction of the
outcome, e.g. there will be a difference in the
test scores.
THE NORMAL CURVE OF DISTRIBUTION
THE NORMAL CURVE OF DISTRIBUTION
• A smooth, perfectly symmetrical, bell-shaped
curve.
• It is symmetrical about the mean and its tails
are assumed to meet the x-axis at infinity.
• Statistical calculations often assume that the
population is distributed normally and then
compare the data collected from the sample to
the population, allowing inferences to be made
about the population.
THE NORMAL CURVE OF DISTRIBUTION
Assumes that:
– 68.3 per cent of people fall within 1 standard
deviation of the mean;
– 27.1 per cent) are between 1 standard
deviation and 2 standard deviations away
from the mean;
– 4.3 per cent are between 2 and 3 standard
deviations away from the mean;
– 0.3 per cent are more than 3 standard
deviations away from the mean.
SKEWNESS
The curve is not
symmetrical or
bell-shaped
KURTOSIS
(STEEPNESS OF THE CURVE)
STATISTICAL SIGNIFICANCE
If the findings hold true 95% of the time then the
statistical significance level () = 0.05
If the findings hold true 99% of the time then the
statistical significance level () = 0.01
If the findings hold true 99.9% of the time then the
statistical significance level () = 0.001
CORRELATION
Shoe size
1
2
3
4
5
Hat size
1
2
3
4
5
Perfect positive correlation: + 1
CORRELATION
Hand size
1
2
3
4
5
Foot size
1
2
3
4
5
Perfect positive correlation: + 1
CORRELATION
HAND SIZE
1
2
3
4
5
FOOT SIZE
2
1
4
3
5
Positive correlation: <+1
PERFECT POSITIVE CORRELATION
7
6
5
4
3
2
1
0
Line 1
PERFECT NEGATIVE CORRELATION
7
6
5
4
3
2
1
0
Line 1
MIXED CORRELATION
10
8
6
Line 1
4
2
0
CORRELATIONS
Statistical significance is a function of the
co-efficient and the sample size:
– the smaller the sample, the larger the
co-efficient has to be in order to obtain
statistical significance;
– the larger the sample, the smaller the
co-efficient can be in order to obtain
statistical signifiance;
– Statistical significance can be attained
either by having a large coefficient
together with a small sample or having a
small coefficient together with a large
sample.
CORRELATIONS
• Begin with a null hypothesis (e.g. there is no
relationship between the size of hands and the size
of feet). The task is not to support the hypothesis,
i.e. the burden of responsibility is not to support the
null hypothesis.
• If the hypothesis is not supported for 95 per cent or
99 per cent or 99.9 per cent of the population, then
there is a statistically significant relationship
between the size of hands and the size of feet at
the 0.05, 0.01 and 0.001 levels of significance
respectively.
• These levels of significance – the 0.05, 0.01 and
0.001 levels – are the levels at which statistical
significance is frequently taken to be demonstrated.
HYPOTHESIS TESTING
• Commence with a null hypothesis
• Set the level of significance () to be used to
support or not to support the null hypothesis
(the alpha () level); the alpha level is
determined by the researcher.
• Compute the data.
• Determine whether the null hypothesis is
supported or not supported.
• Avoid Type I and Type II errors.
TYPE I AND TYPE II ERRORS
• Null Hypothesis: there is no statistically
significant difference between x and y.
• TYPE I ERROR
– The researcher rejects the null hypothesis when
it is in fact true (like convicting an innocent
person)
increase significance level
• TYPE II ERROR
– The researcher accepts the null hypothesis when
it is in fact false (like finding a guilty person
innocent)
reduce significance level, increase sample size.
EFFECT SIZE
• Increasingly seen as preferable to statistical
significance.
• A way of quantifying the difference between
two groups. It indicates how big the effect is,
something that statistical significance does not.
• For example, if one group has had an
experimental treatment and the other has not
(the control group), then the effect size is a
measure of the effectiveness of the treatment.
EFFECT SIZE
• It is calculated thus:
Effect size 
(mean of experiment al group  mean of control group)
standard deviation of the control group
• Statistics for calculating effect size include r2,
adjusted R2, 2, 2, Cramer’s V, Kendall’s W,
Cohen’s d, Eta, Eta2.
Sum of square between groups
Effect size (Eta ) 
Total sum of squares
2
• Different kinds of statistical treatments use
different effect size calculations.
EFFECT SIZE
• In using Cohen’s d:
0-0.20
0.21-0.50
0.51-1.00
>1.00
= weak effect
= modest effect
= moderate effect
= strong effect
THE POWER OF A TEST
• An estimate of the ability of the test to separate
the effect size from random variation.