Once We`ve Measured It, How Do We Know We`re Right?
Download
Report
Transcript Once We`ve Measured It, How Do We Know We`re Right?
EDU 8603
Day 6
What do the following numbers mean?
85 92 45 90 95 68 97 75 88 85
Educational Measurement
Measurement: assignment of numbers to
differentiate values of a variable
Purpose of measurement for research
Provide a standard format for recording observations,
performances, or other responses of subjects and
summarizing results
GOOD RESEARCH MUST HAVE SOUND
MEASUREMENT!!
Descriptive Statistics
Statistics: procedures that summarize and analyze
quantitative data
Descriptive statistics: statistical procedures that
summarize a set of numbers in terms of central
tendency or variation
Important for understanding what the data tells the
researcher
Descriptive Statistics
Statistics: procedures that summarize and analyze
quantitative data
Descriptive statistics
Statistical procedures that summarize a set of numbers in
terms of central tendency or variation
Foundational for inferential statistics
Important for understanding what the data tells the
researcher
Measures of central tendency
Mean (µ)
Median
Mode
Thought Question
Consider the following scores on a test
Marco 90
Chantelle 88
Chi Bo 92
Adriane 85
Jay 45
Donnie 85
Linda 75
Remi 68
Christy 99
Marcus 97
Which measure of central tendency would Adriane use
when telling her parents about her performance?
Thought Question
If Jay scored an 85 instead of a 45, what changes?
Descriptive Statistics
Frequency distributions (see Figure 6.2)
Normal - scores equally distributed around middle
Positively skewed - large number of low scores and a
small number of high scores; mean being pulled to the
positive
Negatively skewed - large number of high scores and a
small number of low scores; mean being pulled to the
negative
Normal Distribution
An Extreme Example
Consider the salaries of 10 people
Group A – All are teachers.
Salaries: $45,000
$50,000
$50,000
$55,000
$45,000
$50,000
$55,000
$45,000
$50,000
$55,000
An Extreme Example
Consider the salaries of 10 people
Group B – All are teachers; 1 won the lottery.
Salaries: $45,000
$45,000
$50,000
$50,000
$50,000
$55,000
$6,300,000
$45,000
$50,000
$55,000
An Extreme Example
What happens to the mean and median in these 2
examples? Does it change?
What happens to the normal distribution?
Positive Skew
Negative Skew
Descriptive Statistics
Variability
How different are the scores?
Types
Range: the difference between the highest and lowest scores
Standard deviation
The average distance of the scores from the mean
The relationship to the normal distribution
±1 SD = 68% of all scores in a distribution
±2 SD = 95% of all scores in a distribution
Variability
Standard Deviation
Variability
Why does variability matter?
Descriptive Statistics
Relationship
How two sets of scores relate to one another
Correlation (positive)
Low .10 - .39
Moderate .40 - .69
High > .70
Example of Correlation
Validity and Reliability
What’s all the fuss about?
Validity/Reliability and Trustworthiness
Why do we need validity and reliability in
quantitative studies and “trustworthiness” in
qualitative studies?
We can’t trust the
results if we can’t
trust the methods!
Thought Question
On the ACT and SAT assessments, there is a definitive
script that test administrators are required to follow
exactly. What measurement issue are the test makers
addressing?
Reliability of Measurement
Reliability - The extent to which measures are free
from error
Error is measured by consistency
Reliability of Measurement
Sources of error
Test construction and administration
Ambiguous questions, confusing directions,
changes in scoring, interrupted testing, etc.
Subject’s characteristics
Test anxiety, lack of motivation, fatigue,
guessing, etc.
Reliability of Measurement
Reliability
Measurement
0.00 indicates no reliability or consistency
1.00 indicates total reliability or consistency
< .60 = weak reliability
> .80 = sufficient reliability
Reliability of Measurement
Types of reliability evidence
Stability (i.e. test-retest)
Testing the same subject using the same test on two
occasions
Limitation - carryover effects from the first to second
administration of the test
Equivalence (i.e. parallel form)
Testing the same subject with two parallel (i.e. equal)
forms of the same test taken at the same time
Limitation - difficulty in creating parallel forms
Reliability of Measurement
Equivalence and stability
Testing the same subject with two forms of the same test
taken at different times
Limitation - difficulty in creating parallel forms
Reliability of Measurement
Internal consistency
Testing the same subject with one test and “artificially”
splitting the test into two halves
Limitations - must have a minimum of ten (10) questions
Often see “Chronbach’s alpha” for reliability coefficient
(ex – Learning styles)
Reliability of Measurement
Agreement/ Inter-rater reliability
Observational measures
Multiple observers coding similarly
Reliability of Measurement
Enhancing reliability
Standardized administration procedures (e.g.
directions, conditions, etc.)
Appropriate reading level
Reasonable length of the testing period
Counterbalancing the order of testing if several tests are
being given
Validity of Measurement
Validity: the extent to which inferences are appropriate,
meaningful, and useful
Current example – content tests and teacher licensure
Validity of Measurement
For research results to have any value, validity
of the measurement of a variable must exist
Use of established and “new” instruments and the
implications for establishing validity
Importance of establishing validity prior to data
collection (e.g. pilot tests)
Validity
Content
Predictive (criterion-related)
Concurrent
Construct
Thought Question
Criticisms of standardized tests like the SAT claim that
they discriminate against particular groups of students
(especially minorities) and do not represent a broad
enough domain of knowledge to adequately assess a
student’s academic potential. What issue of validity is
operating in these arguments?
Thought Question
Other arguments against the SAT state that the tests do
not adequately estimate an individual’s ability to succeed
in college. What issue of validity is operating here?
Reader’s Digest version…
Reliability
The extent to which scores are free from error
Error is measured by consistency
Validity
The extent to which inferences are appropriate,
meaningful, and useful
“Does the instrument measure what it is supposed to
measure??”
Reliability & Validity of Measurement
What is the relationship of reliability to validity?
If a watch consistently gives the time at 1:10 when
actually it is 1:00, it is ____ but not ____.
______ is necessary but not sufficient condition for
_______.
To be _____ , an instrument must be ______, but a ____
instrument is not necessarily _____.
Midterm
3 parts
Multiple Choice (50%) – terms and application
Short Answer (25%) – application
Essay (25%) – evaluate a research article. This part is
take home.
Take Home Portion of Exam
Schlosser Article
Based on topics we have discussed in class and you have
read about, critique the article based on the following:
Introduction and research problem, including the
researcher’s background and involvement
Review of literature/ theoretical framework
Methods of data collection (including participants) and
data analysis
Results and conclusions including issues of
trustworthiness. Be sure to address whether we should
trust the claims that the authors have made and why we
should or should not trust the claims.