Evidence-Based Education: Finding out what works
Download
Report
Transcript Evidence-Based Education: Finding out what works
Educational Action
Research
Todd Twyman
Summer 2011
Week 2
Statistics
“Statistics means never having to say
you’re certain”
Is It By Chance…?
Claim: Aspirin reduces heart attack risk
11,034 in control group received a placebo
11,034 in treatment group received aspirin
189 in control group had a heart attack
104 in treatment group had a heart attack
Is the study’s claim justified?
Is aspirin really the reason of reduced heart
attack?
Could the results have happened by
chance?
Statistics
“The science of decision making in a world
full of uncertainty,” Robi Polikar
Help us:
Understand how likely it is that the
observations collected from a sample also
apply to the entire population.
Judge whether differences observed among
experimental groups reflect “real” differences.
Purpose of Statistics
Make generalized decisions about a
population by analyzing a sample from the
population
Population: the entire set of people/items/etc.
Sample: small subset that is selected from the
population
Types of Statistical Analyses
Descriptive Statistics
Measures of Central Tendency:
Mean,
Median, and Mode
Measures of Variability
Standard
Deviation
Types of Statistical Analyses
Inferential Statistics
Relationship between two performances
Correlation
Differences between two or more categories
Nominal data
Percentages or chi square
Difference between two or more means
Analysis of Variance (AOV)
T-test
Types of Data
Categorical (Nominal): classifies data into
categories or groups with specific
characteristics
Ordinal: organizes data in an order,
intervals are not assumed to be equal
Interval (Scale): continuous, quantitative
data that fall along a number line
Descriptive Statistics
Statistic: a measure of a characteristic of
the sample
Descriptive Statistic: Summary and
description of collected data.
Displays:
Histograms, boxplots, scatterplots
Summaries:
Mean, median, mode
Standard deviation
Histogram
Test scores
Median:
Divides
equally
Central Tendency
Mean:
Balancing point
Income
Mode:
Most
frequent
score
Central Tendency
Mean: arithmetic average
Median: score that divides the sample in
two
If the sample has an even number of data
points, average the middle two values
Mode: the most common score in the
sample
1
group mean
+
group median
Experimental
Control
-4
-3
-2
-1
0
Standardised score
1
2
3
4
2
group mean
+
group median
Experimental
Control
-2
-1
0
1
2
3
Standardised score
4
5
6
7
Calculating Central Tendency
For the following data, find the mean, median, and
mode:
11 15 16 16 9 16 17 16 13 12
Mean: 11+15+16+16+9+16+17+16+13+12 = 14.1
10
Median: 9 11 12 13 15 16 16 16 16 17 = 15.5
Mode: 16
Measures of Variability
Standard deviation
Measures the extent to which scores in a
distribution deviate from the mean
Always report standard deviation when
you report mean score or we will know
nothing about your distribution!
Measures of Variability
Distribution of data influences the decisions
Example: Find the means
Scores:
50, 50, 50, 50, 50
Mean
Scores:
= 50
3, 75, 100, 12, 60
Mean
= 50
Standard Deviation
Scores with a high standard deviation
suggest that your data set is highly
variable:
Your measure is not very reliable
And/or student scores are highly variable
Standard Deviation
When the scores are pretty
tightly bunched together
and the bell curve is steep,
the SD is small.
When scores are
spread apart and
the bell curve is
relatively flat, the
SD is quite large.
The Normal Curve
In a normal curve, most scores fall somewhere in
the middle. The more extreme the score, the
further from the middle it will be, indicating its
rarity.
The Normal Curve
Characteristics of the Normal Curve
–Test scores cluster
in the middle, fewer
scores are in the
tails
–Curve can
continue to infinity.
Thus, right and left
tails never touch
baseline (zero).
The Normal Curve
Characteristics of the Normal Curve
mean
–50% of scores fall
above the mean.
50%
–50% of scores fall
below the mean.
50%
The Normal Curve
Characteristics of the Normal Curve
–34.1% of the
population will
score between the
mean and 1 SD
above or below the
mean.
–They are within
one standard
deviation of the
mean.
34%
The Normal Curve
Characteristics of the Normal Curve
–13.6% of the
population will
score between 1
and 2 SDs from the
mean.
–They are within
two standard
deviations of the
mean
13.6%
13.6%
The Normal Curve
Characteristics of the Normal Curve
–2.1% of the
population will
score between 2
and 3 SDs from the
mean.
–They are within
three standard
deviations of the
mean.
2.1%
2.1%
Calculating Mean & SD
n
X
X
1
n
n
(X X )
SD
1
n 1
2
Boxplots
Definition: A way of
displaying the variation in
scores graphically
The median of a set of data
separates the data into
two equal parts and is
always marked on a Box
and Whiskers Plot.
Boxplots
The box indicates the
range of scores
between the 25th and
75th percentiles.
The whiskers indicate
the highest and lowest
scores in the data set.
Boxplots
But what does it mean? What information
about your students’ reading performance
data does this graph give you?
Boxplots
• 1/4 of your students read fewer than 8.5 pages.
• 1/4 of your students read 8.5 - 12 pages.
• 1/4 of your students read 12 - 14 pages.
• 1/4 of your students read more than 14 pages.
What decisions can you
make?
When would you calculate the mean,
median, mode during your teaching?
How might you use visual displays to
understand the data?
Inferential Statistics
Draw conclusions or generalizations from
a sample with observed properties about
the population itself.
Inferential
Statistics
Sample
Population
Analyzing Categorical Data
Percentages
Always report raw data with
percentages
Example:
55% of the subjects were male (n = 4)
55% of the subjects were male
(n = 12,392)
Analyzing Categorical Data
Chi Square
Used
to evaluate whether the
proportions of individuals who fall into
categories of a variable are equal to the
hypothesized values.
Nominal
Ordinal
data (counts)
data (fail, pass, etc.)
Chi Square
Strong
Weak Reader
Reader
Marginal
Totals
Pass
14
34
48
Fail
26
6
32
Marginal
Totals
40
40
80
Analyzing Interval Data
Correlation Coefficients
Descriptive
Significance - different from zero?
T-test
Significance - was effect due to chance?
Analysis of Variance
Significance - was effect due to chance?
Correlations
Explore strength of relationships
Use continuous variables
Can be positive or negative
The size of the coefficient (r) indicate the
strength of the correlation
No implication of causation!
Correlation Coefficients
Correlation coefficient (r)
Takes a value between -1 and +1
Correlation Coefficients
OSAT
ORF
VOCAB
OSAT
1
.59
0.84
ORF
.59
1
.12
VOCAB
.84
.12
1
Positive and negative
correlation
Creatinine clearance (ClCr)- measure of kidney function
Negative
ClCr
ClCr
Positive
35
70
105
Body Weight (Kg)
30
60
90
Age (Years)
Correlation is present in both cases
Interpreting r
R = +1.0
R = +0.95
R=0
Perfect positive
correlation
R = -0.5
Partial positive
correlation
R = -1.0
No correlation
Partial negative
correlation
Perfect negative
correlation
© Roland Good III
Correlation = 0.59
260
Scaled State Reading Score
250
240
230
220
210
200
190
180
170
0
20
40
60
80
100
120
140
160
Curriculum Based Measure of Oral Reading Fluency (ORF)
180
200
Correlation = 0.84
260
Scaled State Reading Score
250
240
230
220
210
200
190
180
170
120
140
160
180
200
Curriculum Based Measure of Vocabulary (VOCAB)
220
240
Your Turn…
Identify a research question from your
teaching experience that would be
appropriately analyzed using correlation.
T-tests & AOV
Test for statistical significance of
difference between means
Assumptions
Random
samples (robust if n > 30)
From Normal distribution
Equal std deviations (robust if n1 n2)
Independent samples
T-tests & AOV
Could the difference between the
two groups be due to chance?
Are
your groups typical?
Statistical significance (p-value) tells
you the probability such a difference
would have been found by pure
chance.
p
< 0.05 is used most frequently
Significance Level
Alpha of .05 is completely arbitrary.
“I am 90% confident that 75% of
researchers do not know what alpha
actually means, so how could a “correct”
cutoff level be chosen?” (Robin High, UO
Computing Center)
Lower alpha level, lower the probability of
concluding the intervention is effective
when it really isn’t (Type I error).
Effect size
The difference between the two
means, expressed as a proportion of
the standard deviation
ES =(Me – Mc) / SD
Assumptions
Standard
deviations are similar
Distribution is normal
Reliable measures are used
Fancy Slides adopted from…
www.cem.dur.ac.uk/ebeuk/research
Effect size
Average score of
person taught
‘normally’
-4
-3
-2
-1
Average score of
person taught by
experimental method
0
Student Achievement
(standardised)
1
2
3
4
1
Experimental
group mean
+
group median
Control
-4
-3
-2
-1
0
1
Standardised score
Effect size = 0.5
2
3
4
Test Your Skills
Do students who chew gum while taking
tests perform better than students who do
not?
What is the relationship between home
visits and parent participation at afterschool functions?
Review
What are the 3 types of data? Define and
give examples of each.
What are the 3 common measures of
central tendency? Define and give
examples of each.
What is a standard deviation? Why is it
important to calculate?
Homework
Read Nothing Up My Sleeve: Unveiling the
Magic of Statistics (I’ll send it to you
ASAP).
Read Glanz, Appendix B