Day 2 Powerpoint Slides

Download Report

Transcript Day 2 Powerpoint Slides

i
INF 397C
Introduction to Research in Information
Studies
Fall, 2005
Day 2
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
1
Standard Deviation
σ = SQRT(Σ(X -
i
2
µ) /N)
(Does that give you a
headache?)
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
2
i
• USA Today has come out with a new
survey - apparently, three out of every
four people make up 75% of the
population.
– David Letterman
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
3
i
• Statistics: The only science that enables
different experts using the same figures
to draw different conclusions.
– Evan Esar (1899 - 1995), US humorist
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
4
i
Scales (last week)
Nominal
Ordinal
Interval
Ratio
Name
=
=
=
Mutuallyexclusive
=
=
=
Ordered
=
=
Equal
interval
=
+ abs. 0
Days of wk.,
temp.
Inches,
dollars
Gender,
Yes/No
Class rank,
ratings
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
5
Critical Skepticism
i
• Remember the Rabbit Pie example from
last week?
• The “critical consumer” of statistics
asked “what do you mean by ’50/50’”?
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
6
Remember . . .
i
• I do NOT want you to become cynical.
• Not all “media bias” (nor bad research) is
intentional.
• Just be sensible, critical, skeptical.
• As you “consume” statistics, ask some
questions . . .
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
7
Ask yourself. . .
i
• Who says so? (A Zest commercial is unlikely to tell
you that Irish Spring is best.)
• How does he/she know? (That Zest is “the best
soap for you.”)
• What’s missing? (One year, 33% of female grad
students at Johns Hopkins married faculty.)
• Did somebody change the subject? (“Camrys
are bigger than Accords.” “Accords are bigger than
Camrys.”)
• Does it make sense? (“Study in NYC: Working
woman with family needed $40.13/week for adequate
support.”)
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
8
What were . . .
i
• . . . some claims you all heard this week?
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
9
Last week . . .
i
• We learned about frequency
distributions.
• I asserted that a frequency distribution,
and/or a histogram (a graphical
representation of a frequency
distribution), was a good way to
summarize a collection of data.
• And I asserted there’s another, even
shorter-hand way.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
10
Measures of Central Tendency
i
• Mode
– Most frequent score (or scores – a
distribution can have multiple modes)
• Median
– “Middle score”
– 50th percentile
• Mean - µ (“mu”)
– “Arithmetic average”
– ΣX/N
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
11
OK, so which do we use?
i
• Means allow further arithmetic/statistical manipulation. But . . .
• It depends on:
– The type of scale of your data
• Can’t use means with nominal or ordinal scale data
• With nominal data, must use mode
– The distribution of your data
• Tend to use medians with distributions bounded at one
end but not the other (e.g., salary). (Look at our “Number
of MLB games” distribution.)
– The question you want to answer
• “Most popular score” vs. “middle score” vs. “middle of the
see-saw”
• “Statistics can tell us which measures are technically
correct. It cannot tell us which are ‘meaningful’” (Tal,
2001, p. 52).
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
12
Scales (which measure of CT?)
Nominal
(mode)
Name
Mutuallyexclusive
Gender,
Yes/No
Ordinal
(mode,
median)
=
Interval
(any)
Ratio
(any)
=
=
=
=
=
Ordered
=
=
Equal
interval
=
+ abs. 0
Days of wk.,
temp.
Inches,
dollars
Class rank,
ratings
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
i
13
Mean – “see saw” (from Tal, 2001)
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
i
14
Have sidled up to SHAPES of
distributions
i
• Symmetrical
• Skewed – positive and negative
• Flat
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
15
“Pulling up the mean”
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
i
16
Why . . .
i
• . . . isn’t a “measure of central tendency”
all we need to characterize a distribution
of scores/numbers/data/stuff?
• “The price for using measures of central
tendency is loss of information” (Tal,
2001, p. 49).
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
17
Note . . .
•
•
•
•
•
i
We started with a bunch of specific scores.
We put them in order.
We drew their distribution.
Now we can report their central tendency.
So, we’ve moved AWAY from specifics, to a
summary. But with Central Tendency, alone,
we’ve ignored the specifics altogether.
– Note MANY distributions could have a particular
central tendency!
• If we went back to ALL the specifics, we’d be
back at square one.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
18
Measures of Dispersion
i
• Range
• Semi-interquartile range
• Standard deviation
– σ (sigma)
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
19
Range
i
• Highest score minus the lowest score.
• Like the mode . . .
– Easy to calculate
– Potentially misleading
– Doesn’t take EVERY score into account.
• What we need to do is calculate one number
that will capture HOW spread out our numbers
are from that measure of Central Tendency.
– ‘Cause MANY different distributions of scores can
have the same central tendency!
– “Standard Deviation”
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
20
Back to our data – MLB games
•
•
•
•
•
i
Let’s take just the men in this class
xls spreadsheet.
Measures of central tendency.
Go with mean.
So, how much do the actual scores
deviate from the mean?
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
21
So . . .
i
• Add up all the deviations and we should
have a feel for how disperse, how
spread, how deviant, our distribution is.
• Let’s calculate the Standard Deviation.
• As always, start inside the parentheses.
• Σ(X - µ)
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
22
i
Damn!
• OK, let’s try it on a
smaller set of
numbers.
X
2
3
5
6
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
23
Damn! (cont’d.)
• OK, let’s try it on a
smaller set of
numbers.
i
X
X-µ
2
-2
3
-1
5
1
6
2
Σ = 16 Σ = 0
µ = 4 Hmm.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
24
OK . . .
i
• . . . so mathematicians at this point do
one of two things.
• Take the absolute value or square ‘em.
• We square ‘em. Σ(X - µ)2
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
25
Standard Deviation (cont’d.)
i
• Then take the average of the squared
deviations. Σ(X - µ)2/N
• But this number is so BIG!
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
26
OK . . .
i
• . . . take the square root (to make up for
squaring the deviations earlier).
• σ = SQRT(Σ(X - µ)2/N)
• Now this doesn’t give you a headache,
right?
• I said “right”?
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
27
Hmmm . . .
Mode
Range
Median
?????
Mean
Standard Deviation
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
i
28
We need . . .
i
• A measure of spread that is NOT
sensitive to every little score, just as
median is not.
• SIQR: Semi-interquartile range.
• (Q3 – Q1)/2
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
29
To summarize
Mode
Range
-Easy to calculate.
-May be misleading.
Median
SIQR
Mean
(µ)
SD
(σ)
-Capture the center.
-Not influenced by
extreme scores.
-Take every score into
account.
-Allow later
manipulations.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
i
30
Practice Problems
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
i
31
Who wants to guess . . .
i
• . . . What I think is the most important
sentence in S, Z, & Z (2003), Chapter 2?
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
32
p. 19
i
• Penultimate paragraph, first sentence:
• “If differences in the dependent variable
are to be interpreted unambiguously as a
result of the different independent
variable conditions, proper control
techniques must be used.”
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
33
i
• http://highered.mcgrawhill.com/sites/0072494468/student_view0
/statistics_primer.html
• Click on Statistics Primer.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
34
Homework
i
• LOTS of reading. See syllabus.
• Send a table/graph/chart that you’ve
read this past week. Send email by
noon, Friday, 9/16/2005.
See you next week.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
35