Day 3 Slides - School of Information

Download Report

Transcript Day 3 Slides - School of Information

i
LIS 397.1
Introduction to Research in Library and
Information Science
Summer, 2003
Day 3
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
1
Standard Deviation
σ = SQRT(Σ(X -
i
2
µ) /N)
(Does that give you a
headache?)
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
2
i
• USA Today has come out with a new
survey - apparently, three out of every
four people make up 75% of the
population.
– David Letterman (1947 - )
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
3
i
• Statistics: The only science that enables
different experts using the same figures
to draw different conclusions.
– Evan Esar (1899 - 1995), US humorist
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
4
How many of you . . .
i
• . . . told your roommate/friend/significant
other some interesting thing about
statistics this weekend?
• . . . found yourself thinking about
statistics, when you NEVER would have
guessed you ever would?
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
5
Last class . . .
i
• We learned about frequency
distributions.
• I asserted that a frequency distribution,
and/or a histogram (a graphical
representation of a frequency
distribution), was a good way to
summarize a collection of data.
• There’s another, even shorter-hand way.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
6
Measures of Central Tendency
i
• Mode
– Most frequent score (or scores – a
distribution can have multiple modes)
• Median
– “Middle score”
– 50th percentile
• Mean - µ (“mu”)
– “Arithmetic average”
– ΣX/N
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
7
Let’s calculate some “averages”
i
• From old data.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
8
A quiz about averages
i
1 – If one score in a distribution changes, will the mode
change?
__Yes __No __Maybe
2 – How about the median?
__Yes __No __Maybe
3 – How about the mean?
__Yes __No __Maybe
4 – True or false: In a normal distribution (bell curve), the
mode, median, and mean are all the same? __True
__False
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
9
More quiz
i
5 – (This one is tricky.) If the mode=mean=median, then the
distribution is necessarily a bell curve?
__True __False
6 – I have a distribution of 10 scores. There was an error, and really the
highest score is 5 points HIGHER than previously thought.
a) What does this do to the mode?
__ Increases it __Decreases it __Nothing __Can’t tell
b) What does this do to the median?
__ Increases it __Decreases it __Nothing __Can’t tell
c) What does this do to the mean?
__ Increases it __Decreases it __Nothing __Can’t tell
7 – Which of the following must be an actual score from the
distribution?
a) Mean
b) Median
c) Mode
d) None of the above
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
10
OK, so which do we use?
i
• Means allow further arithmetic/statistical manipulation. But . . .
• It depends on:
– The type of scale of your data
• Can’t use means with nominal or ordinal scale data
• With nominal data, must use mode
– The distribution of your data
• Tend to use medians with distributions bounded at one
end but not the other (e.g., salary). (Look at our “Number
of MLB games” distribution.)
– The question you want to answer
• “Most popular score” vs. “middle score” vs. “middle of the
see-saw”
• “Statistics can tell us which measures are technically
correct. It cannot tell us which are ‘meaningful’” (Tal,
2001, p. 52).
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
11
Have sidled up to SHAPES of
distributions
i
• Symmetrical
• Skewed – positive and negative
• Flat
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
12
Why . . .
i
• . . . isn’t a “measure of central tendency”
all we need to characterize a distribution
of scores/numbers/data/stuff?
• “The price for using measures of central
tendency is loss of information” (Tal,
2001, p. 49).
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
13
Note . . .
•
•
•
•
•
i
We started with a bunch of specific scores.
We put them in order.
We drew their distribution.
Now we can report their central tendency.
So, we’ve moved AWAY from specifics, to a
summary. But with Central Tendency, alone,
we’ve ignored the specifics altogether.
– Note MANY distributions could have a particular
central tendency!
• If we went back to ALL the specifics, we’d be
back at square one.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
14
Measures of Dispersion
i
• Range
• Semi-interquartile range
• Standard deviation
– σ (sigma)
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
15
Range
i
• Like the mode . . .
– Easy to calculate
– Potentially misleading
– Doesn’t take EVERY score into account.
• What we need to do is calculate one
number that will capture HOW spread
out our numbers are from that Central
Tendency.
– “Standard Deviation”
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
16
New distribution
i
• Number of semesters all 2003 UT
ISchool graduates spent in graduate
school.
• (I made this up.)
• Measures of central tendency.
• Go with mean.
• So, how much do the actual scores
deviate from the mean?
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
17
So . . .
i
• Add up all the deviations and we should
have a feel for how disperse, how
spread, how deviant, our distribution is.
• Σ(X - µ)
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
18
Damn!
i
• OK, so mathematicians at this point do
one of two things.
• Take the absolute value or square ‘em.
• We square ‘em. Σ(X - µ)2
• Then take the average of the squared
deviations. Σ(X - µ)2/N
• But this number is so BIG!
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
19
OK . . .
i
• . . . take the square root (to make up for
squaring the deviations earlier).
• σ = SQRT(Σ(X - µ)2/N)
• Now this doesn’t give you a headache,
right?
• I said “right”?
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
20
Hmmm . . .
Mode
Range
Median
?????
Mean
Standard Deviation
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
i
21
We need . . .
i
• A measure of spread that is NOT
sensitive to every little score, just as
median is not.
• SIQR: Semi-interquartile range.
• (Q3 – Q1)/2
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
22
To summarize
Mode
Range
-Easy to calculate.
-Maybe be misleading.
Median
SIQR
Mean
(µ)
SD
(σ)
-Capture the center.
-Not influenced by
extreme scores.
-Take every score into
account.
-Allow later
manipulations.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
i
23
Graphs
i
• Graphs/tables/charts do a good job
(done well) of depicting all the data.
• But they cannot be manipulated
mathematically.
• Plus it can be ROUGH when you have
LOTS of data.
• Let’s look at your examples.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
24
Some rules . . .
i
• . . . For building graphs/tables/charts:
– Label axes.
– Divide up the axes evenly.
– Indicate when there’s a break in the rhythm!
– Keep the “aspect ratio” reasonable.
– Histogram, bar chart, line graph, pie chart,
stacked bar chart, which when?
– Keep the user in mind.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
25
Who wants to guess . . .
i
• . . . What I think is the most important
sentence in S, Z, & Z (2003), Chapter 2?
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
26
p. 19
i
• Penultimate paragraph, first sentence:
• “If differences in the dependent variable
are to be interpreted unambiguously as a
result of the different independent
variable conditions, proper control
techniques must be used.”
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
27
i
• http://highered.mcgrawhill.com/sites/0072494468/student_view0
/statistics_primer.html
• Click on Statistics Primer.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
28
Homework
i
- Keep reading.
See you tomorrow.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
29