Day 2 Class Slides - School of Information

Download Report

Transcript Day 2 Class Slides - School of Information

i
INF 397C
Introduction to Research in Library and
Information Science
Spring, 2005
Day 2
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
1
Standard Deviation
σ = SQRT(Σ(X -
i
2
µ) /N)
(Does that give you a
headache?)
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
2
i
• USA Today has come out with a new
survey - apparently, three out of every
four people make up 75% of the
population.
– David Letterman
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
3
i
• Statistics: The only science that enables
different experts using the same figures
to draw different conclusions.
– Evan Esar (1899 - 1995), US humorist
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
4
i
How to talk about a set of #s?
Fing.
lgth
MLB
gms
Name
M/F
B'day
Alex J.
Ben B.
Brazos P.
Derek N.
M
M
M
M
9-Nov
19-Dec
5-Sep
5-Aug
5
7
8
8
2
0
6
12
4
3
4
4
Hans H.
Jay Y.
Mike Z.
M
M
M
24-Jan
2-Jul
10-Feb
7.4
7.5
7.3
0
3
0
4
4
5
Randolph B.
Terry V.
M
M
16-Jan
10-Oct
7.1
7
43
4
5
5
Will M.
M
31-Oct
7.7
50
4
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
Q
5
B'day
24-Jan
Fing
lgth.
7.4
MLB
gms.
0
Q
4
Name
Hans H.
M/F
M
Mike Z.
Ben B.
Alex J.
Jay Y.
M
M
M
M
10-Feb
19-Dec
9-Nov
2-Jul
7.3
7
5
7.5
0
0
2
3
5
3
4
4
Terry V.
Brazos P.
Derek N.
M
M
M
10-Oct
5-Sep
5-Aug
7
8
8
4
6
12
5
4
4
Randolph B.
M
16-Jan
7.1
43
5
Will M.
M
31-Oct
7.7
50
4
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
i
6
Histograms
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
i
7
Percentiles/Deciles
i
• The cumulative percentage for any given
score is the “percentile” for that score.
• The decile is one-tenth of the percentile
(usually rounded to the nearest whole
number).
• So, in our finger example, 7.7 cm was
the 80th percentile, or the 8th decile.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
8
Scales
i
• The data we collect can be represented
on one of FOUR types of scales:
– Nominal
– Ordinal
– Interval
– Ratio
• “Scale” in the sense that an individual
score is placed at some point along a
continuum.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
9
Nominal Scale
i
• Describe something by giving it a name.
(Name – Nominal. Get it?)
• Mutually exclusive categories.
• For example:
– Gender: 1 = Female, 2 = Male
– Marital status: 1 = single, 2 = married,
3 = divorced, 4 = widowed
– Make of car: 1 = Ford, 2 = Chevy . . .
• The numbers are just names.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
10
Ordinal Scale
i
• An ordered set of objects.
• But no implication about the relative
SIZE of the steps.
• Example:
– The 50 states in order of population:
•
•
•
•
1 = California
2 = Texas
3 = New York
. . . 50 = Wyoming
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
11
Interval Scale
i
• Ordered, like an ordinal scale.
• Plus there are equal intervals between each
pair of scores.
• With Interval data, we can calculate means
(averages).
• However, the zero point is arbitrary.
• Examples:
– Temperature in Fahrenheit or Centigrade.
– IQ scores
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
12
Ratio Scale
i
• Interval scale, plus an absolute zero.
• Sample:
– Distance, weight, height, time (but not years
– e.g., the year 2002 isn’t “twice” 1001).
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
13
Scales (cont’d.)
i
It’s possible to measure the same attribute on
different scales. Say, for instance, your
midterm test. I could:
• Give you a “1” if you don’t finish, and a “2” if
you finish.
• “1” for highest grade in class, “2” for second
highest grade, . . . .
• “1” for first quarter of the class, “2” for second
quarter of the class,” . . .
• Raw test score (100, 99, . . . .).
– (NOTE: A score of 100 doesn’t mean the person
“knows” twice as much as a person who scores 50,
he/she just gets twice the score.)
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
14
i
Scales (cont’d.)
Nominal
Ordinal
Interval
Ratio
Name
=
=
=
Mutuallyexclusive
=
=
=
Ordered
=
=
Equal
interval
=
+ abs. 0
Days of wk.,
temp.
Inches,
dollars
Gender,
Yes/No
Class rank,
ratings
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
15
Critical Skepticism
i
• Remember the Rabbit Pie example from
last week?
• The “critical consumer” of statistics
asked “what do you mean by ’50/50’”?
• Let’s look at some other situations and
claims.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
16
Company is hurting.
i
• We’d like to ask you to take a 50% cut in
pay.
• But if you do, we’ll give you a 60% raise
next month. OK?
• Problem: Base rate.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
17
Sale!
i
• “Save 100%”
• I doubt it.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
18
Probabilities
i
• “It’s safer to drive in the fog than in the
sunshine.” (Kinda like “Most accidents occur
within 25 miles of home.” Doesn’t mean it gets
safer once you get to San Marcos.)
• Navy literature around WWI:
– “The death rate in the Navy during the SpanishAmerican war was 9/1000. For civilians in NYC
during the same period it was 16/1000. So . . . Join
the Navy. It’s safer.”
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
19
Are all results reported?
i
• “In an independent study [ooh, magic
words], people who used Doakes
toothpaste had 23% fewer cavities.”
• How many studies showed MORE
cavities for Doakes users?
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
20
Sampling problems
i
• “Average salary of 1999 UT grads –
“$41,000.”
• How did they find this? I’ll bet it was
average salary of THOSE WHO
RESPONDED to a survey.
• Who’s inclined to respond?
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
21
Correlation ≠ Causation
i
• Around the turn of the century, there
were relatively MANY deaths of
tuberculosis in Arizona.
• What’s up with that?
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
22
Remember . . .
•
•
•
•
i
I do NOT want you to become cynical.
Not all “media bias” is intentional.
Just be sensible, critical, skeptical.
As you “consume” statistics, ask some
questions . . .
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
23
Ask yourself. . .
i
• Who says so? (A Zest commercial is unlikely to tell
you that Irish Spring is best.)
• How does he/she know? (That Zest is “the best
soap for you.”)
• What’s missing? (One year, 33% of female grad
students at Johns Hopkins married faculty.)
• Did somebody change the subject? (“Camrys
are bigger than Accords.” “Accords are bigger than
Camrys.”)
• Does it make sense? (“Study in NYC: Working
woman with family needed $40.13/week for adequate
support.”)
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
24
Quote on front of Huff book:
i
• “It ain’t so much the things we don’t
know that get us in trouble. It’s the
things we know that ain’t so.”
Artemus Ward, US author
• Being a critical consumer of statistics will
keep you from knowing things that ain’t
so.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
25
Claims
i
• “Better chance of being struck by
lightening than being bitten by a shark.”
• Tom Brokaw – Tranquilizers.
• What are some claims you all
heard/read?
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
26
Break
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
i
27
Before the break . . .
i
• We learned about frequency
distributions.
• I asserted that a frequency distribution,
and/or a histogram (a graphical
representation of a frequency
distribution), was a good way to
summarize a collection of data.
• There’s another, even shorter-hand way.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
28
Measures of Central Tendency
i
• Mode
– Most frequent score (or scores – a
distribution can have multiple modes)
• Median
– “Middle score”
– 50th percentile
• Mean - µ (“mu”)
– “Arithmetic average”
– ΣX/N
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
29
Let’s calculate some “averages”
i
• From old data.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
30
A quiz about averages
i
1 – If one score in a distribution changes, will the mode
change?
__Yes __No __Maybe
2 – How about the median?
__Yes __No __Maybe
3 – How about the mean?
__Yes __No __Maybe
4 – True or false: In a normal distribution (bell curve), the
mode, median, and mean are all the same? __True
__False
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
31
More quiz
i
5 – (This one is tricky.) If the mode=mean=median, then the distribution is
necessarily a bell curve?
__True __False
6 – I have a distribution of 10 scores. There was an error, and really the
highest score is 5 points HIGHER than previously thought.
a) What does this do to the mode?
__ Increases it __Decreases it __Nothing __Can’t tell
b) What does this do to the median?
__ Increases it __Decreases it __Nothing __Can’t tell
c) What does this do to the mean?
__ Increases it __Decreases it __Nothing __Can’t tell
7 – Which of the following must be an actual score from the distribution?
a) Mean
b) Median
c) Mode
d) None of the above
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
32
OK, so which do we use?
i
• Means allow further arithmetic/statistical manipulation. But . . .
• It depends on:
– The type of scale of your data
• Can’t use means with nominal or ordinal scale data
• With nominal data, must use mode
– The distribution of your data
• Tend to use medians with distributions bounded at one
end but not the other (e.g., salary). (Look at our “Number
of MLB games” distribution.)
– The question you want to answer
• “Most popular score” vs. “middle score” vs. “middle of the
see-saw”
• “Statistics can tell us which measures are technically
correct. It cannot tell us which are ‘meaningful’” (Tal,
2001, p. 52).
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
33
Have sidled up to SHAPES of
distributions
i
• Symmetrical
• Skewed – positive and negative
• Flat
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
34
Why . . .
i
• . . . isn’t a “measure of central tendency”
all we need to characterize a distribution
of scores/numbers/data/stuff?
• “The price for using measures of central
tendency is loss of information” (Tal,
2001, p. 49).
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
35
Note . . .
•
•
•
•
•
i
We started with a bunch of specific scores.
We put them in order.
We drew their distribution.
Now we can report their central tendency.
So, we’ve moved AWAY from specifics, to a
summary. But with Central Tendency, alone,
we’ve ignored the specifics altogether.
– Note MANY distributions could have a particular
central tendency!
• If we went back to ALL the specifics, we’d be
back at square one.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
36
Measures of Dispersion
i
• Range
• Semi-interquartile range
• Standard deviation
– σ (sigma)
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
37
Range
i
• Like the mode . . .
– Easy to calculate
– Potentially misleading
– Doesn’t take EVERY score into account.
• What we need to do is calculate one
number that will capture HOW spread
out our numbers are from that Central
Tendency.
– “Standard Deviation”
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
38
Back to our data – MLB games
i
• Let’s take just the men in this class,
since N = 10, and it’ll be easy to do the
math..
• xls spreadsheet.
• Measures of central tendency.
• Go with mean.
• So, how much do the actual scores
deviate from the mean?
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
39
So . . .
i
• Add up all the deviations and we should
have a feel for how disperse, how
spread, how deviant, our distribution is.
• Let’s calculate the Standard Deviation.
• σ = SQRT(Σ(X - µ)2/N)
• Σ(X - µ)
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
40
Damn!
i
• OK, so mathematicians at this point do
one of two things.
• Take the absolute value or square ‘em.
• We square ‘em. Σ(X - µ)2
• Then take the average of the squared
deviations. Σ(X - µ)2/N
• But this number is so BIG!
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
41
OK . . .
i
• . . . take the square root (to make up for
squaring the deviations earlier).
• σ = SQRT(Σ(X - µ)2/N)
• Now this doesn’t give you a headache,
right?
• I said “right”?
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
42
Hmmm . . .
Mode
Range
Median
?????
Mean
Standard Deviation
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
i
43
We need . . .
i
• A measure of spread that is NOT
sensitive to every little score, just as
median is not.
• SIQR: Semi-interquartile range.
• (Q3 – Q1)/2
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
44
To summarize
Mode
Range
-Easy to calculate.
-Maybe be misleading.
Median
SIQR
Mean
(µ)
SD
(σ)
-Capture the center.
-Not influenced by
extreme scores.
-Take every score into
account.
-Allow later
manipulations.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
i
45
Graphs
i
• Graphs/tables/charts do a good job
(done well) of depicting all the data.
• But they cannot be manipulated
mathematically.
• Plus it can be ROUGH when you have
LOTS of data.
• Let’s look at your examples of claims.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
46
Some rules . . .
i
• . . . For building graphs/tables/charts:
– Label axes.
– Divide up the axes evenly.
– Indicate when there’s a break in the rhythm!
– Keep the “aspect ratio” reasonable.
– Histogram, bar chart, line graph, pie chart,
stacked bar chart, which when?
– Keep the user in mind.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
47
Who wants to guess . . .
i
• . . . What I think is the most important
sentence in S, Z, & Z (2003), Chapter 2?
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
48
p. 19
i
• Penultimate paragraph, first sentence:
• “If differences in the dependent variable
are to be interpreted unambiguously as a
result of the different independent
variable conditions, proper control
techniques must be used.”
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
49
i
• http://highered.mcgrawhill.com/sites/0072494468/student_view0
/statistics_primer.html
• Click on Statistics Primer.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
50
Homework
i
• LOTS of reading. See syllabus.
• Send a table/graph/chart that you’ve
read this past week. Send email by
noon, Friday, 2/4/2005.
See you next week.
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected]
51