Once We`ve Measured It, How Do We Know We`re Right?

Download Report

Transcript Once We`ve Measured It, How Do We Know We`re Right?

EDU 8603
Day 6
 What do the following numbers mean?
85 92 45 90 95 68 97 75 88 85
Educational Measurement
 Measurement: assignment of numbers to
differentiate values of a variable
 Purpose of measurement for research
 Provide a standard format for recording observations,
performances, or other responses of subjects and
summarizing results
 GOOD RESEARCH MUST HAVE SOUND
MEASUREMENT!!
Descriptive Statistics
 Statistics: procedures that summarize and analyze
quantitative data
 Descriptive statistics: statistical procedures that
summarize a set of numbers in terms of central
tendency or variation
 Important for understanding what the data tells the
researcher
Descriptive Statistics
 Statistics: procedures that summarize and analyze
quantitative data
 Descriptive statistics


Statistical procedures that summarize a set of numbers in
terms of central tendency or variation
Foundational for inferential statistics
 Important for understanding what the data tells the
researcher
Measures of central tendency
 Mean (µ)
 Median
 Mode
Thought Question
 Consider the following scores on a test
Marco 90
Chantelle 88
Chi Bo 92
Adriane 85
Jay 45
Donnie 85
Linda 75
Remi 68
Christy 99
Marcus 97
 Which measure of central tendency would Adriane use
when telling her parents about her performance?
Thought Question
If Jay scored an 85 instead of a 45, what changes?
Descriptive Statistics
 Frequency distributions (see Figure 6.2)
 Normal - scores equally distributed around middle
 Positively skewed - large number of low scores and a
small number of high scores; mean being pulled to the
positive
 Negatively skewed - large number of high scores and a
small number of low scores; mean being pulled to the
negative
Normal Distribution
An Extreme Example
 Consider the salaries of 10 people
 Group A – All are teachers.
Salaries: $45,000
$50,000
$50,000
$55,000
$45,000
$50,000
$55,000
$45,000
$50,000
$55,000
An Extreme Example
 Consider the salaries of 10 people
 Group B – All are teachers; 1 won the lottery.
Salaries: $45,000
$45,000
$50,000
$50,000
$50,000
$55,000
$6,300,000
$45,000
$50,000
$55,000
An Extreme Example
 What happens to the mean and median in these 2
examples? Does it change?
 What happens to the normal distribution?
Positive Skew
Negative Skew
Descriptive Statistics
 Variability
 How different are the scores?
 Types


Range: the difference between the highest and lowest scores
Standard deviation
 The average distance of the scores from the mean
 The relationship to the normal distribution
 ±1 SD = 68% of all scores in a distribution
 ±2 SD = 95% of all scores in a distribution
Variability
Standard Deviation
Variability
 Why does variability matter?
Descriptive Statistics
 Relationship
 How two sets of scores relate to one another
 Correlation (positive)
 Low .10 - .39
 Moderate .40 - .69
 High > .70
Example of Correlation
Validity and Reliability
What’s all the fuss about?
Validity/Reliability and Trustworthiness
 Why do we need validity and reliability in
quantitative studies and “trustworthiness” in
qualitative studies?
We can’t trust the
results if we can’t
trust the methods!
Thought Question
 On the ACT and SAT assessments, there is a definitive
script that test administrators are required to follow
exactly. What measurement issue are the test makers
addressing?
Reliability of Measurement
 Reliability - The extent to which measures are free
from error
 Error is measured by consistency
Reliability of Measurement
 Sources of error
 Test construction and administration

Ambiguous questions, confusing directions,
changes in scoring, interrupted testing, etc.
 Subject’s characteristics

Test anxiety, lack of motivation, fatigue,
guessing, etc.
Reliability of Measurement
 Reliability
 Measurement




0.00 indicates no reliability or consistency
1.00 indicates total reliability or consistency
< .60 = weak reliability
> .80 = sufficient reliability
Reliability of Measurement
 Types of reliability evidence
 Stability (i.e. test-retest)


Testing the same subject using the same test on two
occasions
Limitation - carryover effects from the first to second
administration of the test
 Equivalence (i.e. parallel form)


Testing the same subject with two parallel (i.e. equal)
forms of the same test taken at the same time
Limitation - difficulty in creating parallel forms
Reliability of Measurement
 Equivalence and stability
 Testing the same subject with two forms of the same test
taken at different times
 Limitation - difficulty in creating parallel forms
Reliability of Measurement
 Internal consistency
 Testing the same subject with one test and “artificially”
splitting the test into two halves
 Limitations - must have a minimum of ten (10) questions
 Often see “Chronbach’s alpha” for reliability coefficient
(ex – Learning styles)
Reliability of Measurement
 Agreement/ Inter-rater reliability
 Observational measures
 Multiple observers coding similarly
Reliability of Measurement
 Enhancing reliability
 Standardized administration procedures (e.g.
directions, conditions, etc.)
 Appropriate reading level
 Reasonable length of the testing period
 Counterbalancing the order of testing if several tests are
being given
Validity of Measurement
 Validity: the extent to which inferences are appropriate,
meaningful, and useful
 Current example – content tests and teacher licensure
Validity of Measurement
 For research results to have any value, validity
of the measurement of a variable must exist
 Use of established and “new” instruments and the
implications for establishing validity
 Importance of establishing validity prior to data
collection (e.g. pilot tests)
Validity
 Content
 Predictive (criterion-related)
 Concurrent
 Construct
Thought Question
 Criticisms of standardized tests like the SAT claim that
they discriminate against particular groups of students
(especially minorities) and do not represent a broad
enough domain of knowledge to adequately assess a
student’s academic potential. What issue of validity is
operating in these arguments?
Thought Question
 Other arguments against the SAT state that the tests do
not adequately estimate an individual’s ability to succeed
in college. What issue of validity is operating here?
Reader’s Digest version…
 Reliability
 The extent to which scores are free from error
 Error is measured by consistency
 Validity
 The extent to which inferences are appropriate,
meaningful, and useful
 “Does the instrument measure what it is supposed to
measure??”
Reliability & Validity of Measurement
 What is the relationship of reliability to validity?
 If a watch consistently gives the time at 1:10 when
actually it is 1:00, it is ____ but not ____.
 ______ is necessary but not sufficient condition for
_______.
 To be _____ , an instrument must be ______, but a ____
instrument is not necessarily _____.
Midterm
 3 parts
 Multiple Choice (50%) – terms and application
 Short Answer (25%) – application
 Essay (25%) – evaluate a research article. This part is
take home.
Take Home Portion of Exam
Schlosser Article
Based on topics we have discussed in class and you have
read about, critique the article based on the following:
 Introduction and research problem, including the
researcher’s background and involvement
 Review of literature/ theoretical framework
 Methods of data collection (including participants) and
data analysis
 Results and conclusions including issues of
trustworthiness. Be sure to address whether we should
trust the claims that the authors have made and why we
should or should not trust the claims.