Evidence-Based Education: Finding out what works

Transcript Evidence-Based Education: Finding out what works

Educational Action
Research
Todd Twyman
Summer 2011
Week 2
Statistics
“Statistics means never having to say
you’re certain”
Is It By Chance…?
Claim: Aspirin reduces heart attack risk
11,034 in control group received a placebo
11,034 in treatment group received aspirin
 189 in control group had a heart attack
 104 in treatment group had a heart attack
 Is the study’s claim justified?
 Is aspirin really the reason of reduced heart
attack?
 Could the results have happened by
chance?
Statistics


“The science of decision making in a world
full of uncertainty,” Robi Polikar
Help us:
Understand how likely it is that the
observations collected from a sample also
apply to the entire population.
 Judge whether differences observed among
experimental groups reflect “real” differences.

Purpose of Statistics

Make generalized decisions about a
population by analyzing a sample from the
population
Population: the entire set of people/items/etc.
 Sample: small subset that is selected from the
population

Types of Statistical Analyses

Descriptive Statistics

Measures of Central Tendency:
 Mean,

Median, and Mode
Measures of Variability
 Standard
Deviation
Types of Statistical Analyses

Inferential Statistics
 Relationship between two performances
 Correlation
 Differences between two or more categories
 Nominal data
 Percentages or chi square
 Difference between two or more means
 Analysis of Variance (AOV)
 T-test
Types of Data



Categorical (Nominal): classifies data into
categories or groups with specific
characteristics
Ordinal: organizes data in an order,
intervals are not assumed to be equal
Interval (Scale): continuous, quantitative
data that fall along a number line
Descriptive Statistics



Statistic: a measure of a characteristic of
the sample
Descriptive Statistic: Summary and
description of collected data.
Displays:


Histograms, boxplots, scatterplots
Summaries:
Mean, median, mode
 Standard deviation

Histogram
Test scores
Median:
Divides
equally
Central Tendency
Mean:
Balancing point
Income
Mode:
Most
frequent
score
Central Tendency


Mean: arithmetic average
Median: score that divides the sample in
two


If the sample has an even number of data
points, average the middle two values
Mode: the most common score in the
sample
1
group mean
+
group median
Experimental
Control
-4
-3
-2
-1
0
Standardised score
1
2
3
4
2
group mean
+
group median
Experimental
Control
-2
-1
0
1
2
3
Standardised score
4
5
6
7
Calculating Central Tendency
For the following data, find the mean, median, and
mode:
11 15 16 16 9 16 17 16 13 12

Mean: 11+15+16+16+9+16+17+16+13+12 = 14.1
10


Median: 9 11 12 13 15 16 16 16 16 17 = 15.5
Mode: 16
Measures of Variability

Standard deviation


Measures the extent to which scores in a
distribution deviate from the mean
Always report standard deviation when
you report mean score or we will know
nothing about your distribution!
Measures of Variability

Distribution of data influences the decisions

Example: Find the means
 Scores:
50, 50, 50, 50, 50
 Mean
 Scores:
= 50
3, 75, 100, 12, 60
 Mean
= 50
Standard Deviation

Scores with a high standard deviation
suggest that your data set is highly
variable:

Your measure is not very reliable

And/or student scores are highly variable
Standard Deviation
When the scores are pretty
tightly bunched together
and the bell curve is steep,
the SD is small.
When scores are
spread apart and
the bell curve is
relatively flat, the
SD is quite large.
The Normal Curve
In a normal curve, most scores fall somewhere in
the middle. The more extreme the score, the
further from the middle it will be, indicating its
rarity.
The Normal Curve
Characteristics of the Normal Curve
–Test scores cluster
in the middle, fewer
scores are in the
tails
–Curve can
continue to infinity.
Thus, right and left
tails never touch
baseline (zero).
The Normal Curve
Characteristics of the Normal Curve
mean
–50% of scores fall
above the mean.
50%
–50% of scores fall
below the mean.
50%
The Normal Curve
Characteristics of the Normal Curve
–34.1% of the
population will
score between the
mean and 1 SD
above or below the
mean.
–They are within
one standard
deviation of the
mean.
34%
The Normal Curve
Characteristics of the Normal Curve
–13.6% of the
population will
score between 1
and 2 SDs from the
mean.
–They are within
two standard
deviations of the
mean
13.6%
13.6%
The Normal Curve
Characteristics of the Normal Curve
–2.1% of the
population will
score between 2
and 3 SDs from the
mean.
–They are within
three standard
deviations of the
mean.
2.1%
2.1%
Calculating Mean & SD
n
X
X
1
n
n
 (X  X )
SD 
1
n 1
2
Boxplots
Definition: A way of
displaying the variation in
scores graphically
The median of a set of data
separates the data into
two equal parts and is
always marked on a Box
and Whiskers Plot.
Boxplots
The box indicates the
range of scores
between the 25th and
75th percentiles.
The whiskers indicate
the highest and lowest
scores in the data set.
Boxplots
But what does it mean? What information
about your students’ reading performance
data does this graph give you?
Boxplots
• 1/4 of your students read fewer than 8.5 pages.
• 1/4 of your students read 8.5 - 12 pages.
• 1/4 of your students read 12 - 14 pages.
• 1/4 of your students read more than 14 pages.
What decisions can you
make?

When would you calculate the mean,
median, mode during your teaching?

How might you use visual displays to
understand the data?
Inferential Statistics

Draw conclusions or generalizations from
a sample with observed properties about
the population itself.
Inferential
Statistics
Sample
Population
Analyzing Categorical Data
Percentages
 Always report raw data with
percentages
 Example:
 55% of the subjects were male (n = 4)
 55% of the subjects were male
(n = 12,392)
Analyzing Categorical Data
Chi Square
 Used
to evaluate whether the
proportions of individuals who fall into
categories of a variable are equal to the
hypothesized values.
 Nominal
 Ordinal
data (counts)
data (fail, pass, etc.)
Chi Square
Strong
Weak Reader
Reader
Marginal
Totals
Pass
14
34
48
Fail
26
6
32
Marginal
Totals
40
40
80
Analyzing Interval Data

Correlation Coefficients
Descriptive
 Significance - different from zero?


T-test


Significance - was effect due to chance?
Analysis of Variance

Significance - was effect due to chance?
Correlations





Explore strength of relationships
Use continuous variables
Can be positive or negative
The size of the coefficient (r) indicate the
strength of the correlation
No implication of causation!
Correlation Coefficients
Correlation coefficient (r)
Takes a value between -1 and +1
Correlation Coefficients
OSAT
ORF
VOCAB
OSAT
1
.59
0.84
ORF
.59
1
.12
VOCAB
.84
.12
1
Positive and negative
correlation
Creatinine clearance (ClCr)- measure of kidney function
Negative
ClCr
ClCr
Positive
35
70
105
Body Weight (Kg)
30
60
90
Age (Years)
Correlation is present in both cases
Interpreting r
R = +1.0
R = +0.95
R=0
Perfect positive
correlation
R = -0.5
Partial positive
correlation
R = -1.0
No correlation
Partial negative
correlation
Perfect negative
correlation
© Roland Good III
Correlation = 0.59
260
Scaled State Reading Score
250
240
230
220
210
200
190
180
170
0
20
40
60
80
100
120
140
160
Curriculum Based Measure of Oral Reading Fluency (ORF)
180
200
Correlation = 0.84
260
Scaled State Reading Score
250
240
230
220
210
200
190
180
170
120
140
160
180
200
Curriculum Based Measure of Vocabulary (VOCAB)
220
240
Your Turn…

Identify a research question from your
teaching experience that would be
appropriately analyzed using correlation.
T-tests & AOV
Test for statistical significance of
difference between means
 Assumptions

 Random
samples (robust if n > 30)
 From Normal distribution
 Equal std deviations (robust if n1  n2)
 Independent samples
T-tests & AOV

Could the difference between the
two groups be due to chance?
 Are

your groups typical?
Statistical significance (p-value) tells
you the probability such a difference
would have been found by pure
chance.
p
< 0.05 is used most frequently
Significance Level
Alpha of .05 is completely arbitrary.
 “I am 90% confident that 75% of
researchers do not know what alpha
actually means, so how could a “correct”
cutoff level be chosen?” (Robin High, UO

Computing Center)

Lower alpha level, lower the probability of
concluding the intervention is effective
when it really isn’t (Type I error).
Effect size
The difference between the two
means, expressed as a proportion of
the standard deviation
 ES =(Me – Mc) / SD
 Assumptions

 Standard
deviations are similar
 Distribution is normal
 Reliable measures are used
Fancy Slides adopted from…
www.cem.dur.ac.uk/ebeuk/research
Effect size
Average score of
person taught
‘normally’
-4
-3
-2
-1
Average score of
person taught by
experimental method
0
Student Achievement
(standardised)
1
2
3
4
1
Experimental
group mean
+
group median
Control
-4
-3
-2
-1
0
1
Standardised score
Effect size = 0.5
2
3
4
Test Your Skills


Do students who chew gum while taking
tests perform better than students who do
not?
What is the relationship between home
visits and parent participation at afterschool functions?
Review



What are the 3 types of data? Define and
give examples of each.
What are the 3 common measures of
central tendency? Define and give
examples of each.
What is a standard deviation? Why is it
important to calculate?
Homework


Read Nothing Up My Sleeve: Unveiling the
Magic of Statistics (I’ll send it to you
ASAP).
Read Glanz, Appendix B

Evidence-Based Education: Finding out what works

Transcript Evidence-Based Education: Finding out what works

Directory