Inferential Statistics - Music Education Resources

Download Report

Transcript Inferential Statistics - Music Education Resources

Inferential Statistics
Class 5
For Monday
• Submit first draft of your literature review
(Chapter 2).
– It should read like one document rather than a
series of abstracts.
– Use transition sentences.
– Group studies into a logical order.
– You should include at least ten research-based
articles/dissertations.
– Include a Reference list in perfect APA format.
Review: Effect of Intensive Instruction on Elementary
Students’ Memory for Culturally Unfamiliar Music
(2013)
•
Previous researchers have found that both adults and children demonstrate better
memory for novel music from their own music culture than from an unfamiliar
music culture. It was the purpose of this study to determine whether this
“enculturation effect” could be mediated through an extended intensive
instructional unit in another culture’s music. Fifth-grade students in four intact
general music classrooms (two each at two elementary schools in a large U.S. city)
took part in an 8-week curriculum exclusively concentrated on Turkish music. Two
additional fifth-grade classes at the same schools served as controls and did not
receive the Turkish curriculum. Prior to and following the 8-week unit, all classes
completed a music memory test that included Western and Turkish music
examples. Comparison of pretest and posttest scores revealed that all participants
(N = 110) were significantly more successful overall on the second test
administration. Consistent with previous findings, participants were significantly
less successful remembering items from the unfamiliar music culture, a result that
was consistent across test administrations and between instruction and control
groups. It appears that the effect of enculturation on music memory is well
established early in life and resistant to modification even through extended
instructional approaches.
Identify or State:
•
•
•
•
•
•
•
•
Independent Variable
Dependent Variable
Treatment Group
Control Group
Diagram experimental design (O & X)
Write a hypothesis & null hypothesis
Paraphrase findings
Implications for the classroom? Did the authors
reject or not reject the null hypothesis?
Internal Validity (Usefulness/Meaningfulness) Control of Extraneous Variables: Time Bound Factors
• What happens within the experiment
– History – What happens b/w pretest and
posttest (private lessons, change in practice
routine)
– Maturation – is change result of treatment
natural result of repetition and
improvement over time?)
– Mortality – Loss of participants may cause
imbalance b/w groups
Internal Validity – Sampling &
Measurement Factors
• Testing – pretest affect posttest. Ceiling and floor effects
(eliminate outliers?)
• Instrumentation – changes in measurement or observers
(judges at contest from one site to the next)
• Statistical regression – students who score extremely
high (ceiling) or low (floor) on pretest may regress to the
mean on posttest
• Selection – participants do not represent normal
population (also affects external validity)
• Interactions – influence of a combination of the above
factors
Internal Validity
• John Henry Effect
–Control group performs beyond usual
level because they perceive they are in
competition with the experimental
group
External Validity – Generalizability
• Population Validity
– Extent sample is representative of the population to which
the researcher wishes to generalize the results.
• Ecological
– Study conditions and setting are representative of the
setting in which the researcher would like to apply the
findings
• Replication
– Results can be reproduced (problem w/ Mozart effect)
• Detailed description of the sample needed in study
– Important regardless of sampling method
– ‘Next best thing’ if not a large, random sample – often the
case in music ed. research
– Consider demographic questions in descriptive research
Other Threats to External Validity
• Effect or interaction of testing (testing will not occur in natural
setting)
• Sample does not reflect population
– Discuss in research report
• Reactive effects of sample
– Hawthorne Effect
• Effects due simply to subjects’ knowledge of being in a study
– Teacher or Researcher interactions different than in
population
• Subconsciously encouraging or discouraging a group
• Research setting does not reflect typical settings (ecological
validity)
– A university lab school
Types of Data
• Nominal/Categorical = numbers as labels
– Male/female
– Sop/Alto/tenor/bass
• Ordinal = ranks
– Contest ratings
• Interval = equal distance b/w each number
– Contest scores (1-100)
– Lack of meaningful zero (0 on test = no knowledge?, 0 temperature =
arbitrary) or meaningful ratios (2x as smart?)
• Ratio =
– Equal interval data
– True zero possible (0 decibels, 0 money)
– Ratios can be calculated in a meaningful way [2x as loud, ½ money,
height, weight, depth (a lake can dry up) (?), etc.]
Inferential Statistics
• Statistic = number describing a variable
• Descriptive statistics = describe population
• Inferential statistics = used when making
inferences about a population based on the
sample
• Stat. used based on type of data and other
assumptions
• Stats used to compare and find differences
Two Types of Inferential Stats
• Parametric
– Interval & ratio data
– Normal or near
normal curve
(distribution)
– Equal variances
(Levin’s test)
– Sample reflects pop.
(randomized)
– Most powerful
• Non-Parametric
– Nominal & Ordinal
data
– Not normal
distribution
(skewness or
kurtosis)
– Unequal variances
– Less powerful
– More conservative
Statistical Significance
• Probability that result happened by chance and not
due to treatment
–
–
–
–
–
Expressed as p
p < .1 = less than 10% (1/10) probability…
p < .05 = less than 5% (1/20) probability…
p < .01 – less than 1% (1/100) probability…
p < .001 – less than .1% (1/1000) probability…
• Computer software reports actual p
• alpha level = probability level to be accepted as
significant set b/f study begins
• Statistical significance does not equal practical
significance
Statistical Power
• Likelihood that a particular test of statistical
significance will lead to the rejection of null
hypothesis
– Parametric tests more powerful than nonparametric. (Par.
more likely to discover differences b/w groups. Choice
depend on type of data)
• The larger the sample size, the more likely you will be
to find statistically significant effects.
• The less stringent your criteria (e.g., .05 vs. 01 vs.
001), the easier it is to find statistical significance
Review-Type I and Type II Error
• Type I Error is erroneously claiming statistical significance or
rejecting the null hypothesis when in fact, it’s true (claiming
success when experiment failed to produce results)
– Possible w. incorrect statistical test
– Or when conducting multiple tests on same data (i.e. comparing 2
groups on multiple variables (achievement test parts). [solution, lower
alpha level]
• Type II Error is when a researcher fails to reject the null
hypothesis when it is in fact false
– The smaller the sample size, the more difficult it is to detect statistical
significance
– In this case, a researcher could be missing an important finding
because of study design
Statistical Tests
http://pspp.awardspace.com/ (Windows)
http://bmi.cchmc.org/resources/software/pspp (Mac)
http://vassarstats.net/
Parametric Assumptions
• Interval Data
• Normality - Scores are normally distributed in each
group
• Homogeneity of Variance - The amount of variability
in scores is similar between each group (Levin’s test)
• If assumption are not met – Use non-parametric
statistics
– Ordinal or nominal data
– Likert scales (esp. short scales)
– Small sample size
One- vs. Two-Tailed Tests
• If a hypothesis is directional in nature it is one-tailed
– The chunking method will be more effective than the whole song method
• If a hypothesis is not directional in nature it is two-tailed
– There will be a no difference in effectiveness between the chunking method
and the whole song method
• Two-tailed tests are most commonly used since specific hypotheses are
rare in music education research.
• If study is designed knowing that results can only go one direction (e.g.,
beginning violin), a one tail test is OK. If treatment can only lead to
positive results (improvement) use a one tail test. If treatment could result
in positive or negative results, use a two tail test.
• One Tailed test more powerful. If your experiment led to improvement but
a two tail test only comes close to significance, try a one tail test. (specify
which you used in your study)
Independent Samples t-test
• Used to determine whether differences between two
independent group means are statistically significant
• n = < 30 for each group. Though many researchers
have used the t test with larger groups.
• Groups do not have to be even. Only concerned with
overall group differences w/o considering pairs
– [A robust statistical technique is one that performs well even if its assumptions
are somewhat violated by the true model from which the data were
generated. Unequal variances = alternative t test or better Mann-Whitney U]
• Application: Explore Data
– Compare reading tests of inst & non-inst. students
Correlated (paired, dependent) Samples ttest
• Used to determine differences between two means
taken from the same group, or from two groups with
matched pairs are statistically significant
– e.g., pre-test achievement scores for the whole song group
vs. post-test achievement scores for the whole song group
• Group size must be even (paired)
• N = < 30 for each group
• Application: Compare Reading & Math test scores of
Instrumental Students
Compare 2 means
• Need sample of at least 10
• Work like Independent and dependent t tests
• Independent
– Mann Whitney U
• Application: Data set #3. Is there a sig. diff. b/w Final ratings at
Site 1 vs. site 2?
• Pairs or dependent samples
– Wilcoxon signed ranks
• Application: Data set #2. Is there a sig. difference b/w rating of
judges 1 & 2?
ANOVA
•
•
•
•
Analyze means of 2+ groups
Homogeneity of variance
Independent or correlated (paired) groups
More rigorous than t-test (b/w group & w/i group
variance). Often used today instead of T test.
• F statistic
• One-Way = 1 independent variable
• Two-Way/Three-Way = 2-3 independent variables
(one active & one or two an attribute)
One-Way ANOVA
• Calculate a One-Way ANOVA for data-set 1 – All noninstrumental tests
• Post Hoc tests
– Used to find differences b/w groups using one test. You
could compare all pairs w/ individual t tests or ANOVA, but
leads to problems w/ multiple comparisons on same data
– Tukey – Equal Sample Sizes (though can be used for unequal sample
sizes as well)
– Sheffe – Unequal Sample Sizes (though can be used for equal sample
sizes as well)
Non-Parametric ANOVAs
• Friedman – Related (correlated) Samples
– Application: Data Set #2 – Sig. dif. b/w judges?
• Kruskal-Wallis – Independent Samples
• No post hoc equivalent to Tukey or Sheffe.
Music do series of Mann-Whitney U or
Wilcoxon for each pair of groups
• Bonferroni Correction
– Used to adjust α (p) for multiple comparison
– .05/N comparisons
2 Way Factorial Designs (2 independent variables [often
one manipulated, one attribute)
2X2 (2 levels of both variables)
METHOD
Language
Classification
Kodaly
Traditional
Bilingual 1
Bilingual 2
Non-Bilingual 1
Non-Bilingual 2
Interpreting Results of 2x2 ANOVA
• (columns) Kodaly was more effective than
Traditional methods for both bilingual and
non-bilingual students
• (rows) Bilingual students scored significantly
higher than non-bilingual students, regardless
of teaching method
• Could be a significant interaction between
language and teaching method
– If there was significant interaction, we would need
to do post hoc Tukey or Sheffe do determine
where the differences lie.
Three Way (2x2x2) ANOVA
Starting
Grade
4 (B1)
5 (B2)
Girls (A1)
Boys (A2)
Lessons
(C1)
No Lessons
(C2)
Lessons
(C1)
No Lessons
(C2)
Performance
Achievement
Performance
Achievement
Performance
Achievement
Performance
Achievement
Performance
Achievement
Performance
Achievement
Performance
Achievement
Performance
Achievement
ANCOVA – Analysis of Covariance
• Statistical control for unequal groups
• Adjusts posttest means based on pretest
means.
• [example]
http://faculty.vassar.edu/lowry/VassarStats.ht
ml
•
[The homogeneity of regression assumption is met if within each of the groups there is an linear
correlation between the dependent variable and the covariate and the correlations are similar b/w
groups]
Effect Size (Cohen’s d)
http://www.uccs.edu/~faculty/lbecker/es.htm
• [Mean of Experimental group – Mean of Control group/average SD]
• The average percentile standing of the average treated (or experimental)
participant relative to the average untreated (or control) participant.
• Use table to find where someone ranked in the 50th percentile in the
experimental group would be in the control group
• Good for showing practical significance
– When test in non-significant
– When both groups got significantly better (really effective vs. really
really effective!
• Calculate effect size:
– Treatment group: M=24.6; SD=10.7
– Control Group: M=10.8; SD=7.77
Chi-Squared
• Measure statistical significance b/w frequency
counts (nominal/categorical data)
• http://www.quantpsy.org/chisq/chisq.htm
• Test for independence: Compare 2 or more
proportions
• Goodness of Fit: compare w/ you have with what is
expected
– Proportions of contest ratings (I, II, III or I & non Is)
– Agree vs. Disagree
• Weak statistical test