Statistical Reasoning

Download Report

Transcript Statistical Reasoning

Statistical Reasoning
For Communication Majors
Mean
► This
is a common statistic, and it’s simple.
► When we refer to the “average,” this is usually
what we mean.
► Add the values and divide by the number of
values you have.
Mean Example
►A
weekly newspaper has seven employees. What’s
the mean salary? Here are their salaries:
► Editor -- $37,000
► Assistant Editor -- $32,000
► Reporter -- $28,000
► Ad Sales Manager -- $38,000
► Ad Sales Agent -- $31,000
► 2 Circulation People -- $22,000 each
Mean Example
► Calculation:
Add 37,000 + 32,000 + 28,000 +
38,000 + 31,000 + 22,000 + 22,000 = 210,000.
Then divide by 7 = Mean salary is $30,000
► NOTE: Mean can be deceptive if there is a wide
spread in the numbers. For example, if the editor
and ad sales manager made $60,000 each, the
sales agent made $40,000, and each of the other
workers made $12,500, the mean would be the
same, but the picture of the average salary at the
newspaper would be much different.
Median
► The
median means the middle.
► It is the value in the dead center of the list of
values when they are lined up from largest to
smallest.
► It represents the average person or group. For
example, if we say “the average household” or
“the average worker,” then what we are looking
for is the median, as in “ordinary” or “most
common.” We aren’t really talking about the
“average” or mean.
Median Example
► Consider
the newsroom salaries used in the
previous example lined up from largest to
smallest: 38,000, 37,000, 32,000, 31,000, 28,000,
22,000, 22,000.
► The salary in the middle, the “median,” is $31,000.
► If the halfway lies between two numbers, split
them.
Percent Change
► If
the city increased parking fines from $10 to
$15, by what percentage did the fines increase?
► This is simple, too. Subtract the old value from the
new value (15-10=5), then divide by the old
value (5/10=0.5). Multiply the result by 100
(0.5x100 = 50 percent ) and that’s the percent
change.
► 15-10=5 5/10=0.5
0.5x100 = 50 percent.
Tax Example
► If
the average property tax increased by $2,000
a year (We’re using median here to find $2,000),
what is the average percent change?
► New value = $10,000
► Old value = $8,000
► 10,000 – 8,000 = 2,000
► 2,000/8,000 = .25
► 100x.25 = 25 percent
► So the percent change is +25 percent
Per capita, Rates and Comparisons
► Per
capita refers to the rate per person. It helps
make comparisons among large groups, like
cities.
► To get per capita, simply divide the number of
incidents by the number of people.
► A Southern city with a population of 450,000
experienced 16 murders during 2009. What is
the city’s murder rate per 100,000 population?
► 450,000/100,000 = 4.5 16/4.5 = 3.5 per 100,000
Per capita example
► If
a city has a population of 600,000 and
experiences 12 murders a year, the per capita
murder rate would be 12 divided by 600,000.
► To avoid tiny decimals, divide 600,000 by 100,000
and report the rate as a number per 100,000
population.
► 600,000/100,000 = 6 12/6 = 2, so the murder
rate is 2 per 100,000 people.
► You can also find the percent change of the per
capita rate over time to discover the trend in the
murder rate.
Comparison Example
► Suppose
you want to know how dangerous the
city is compared to other cities. Our example city
has a population of 600,000 with 12 murders. A
nearby city has 26,000 and 4 murders. Which is
more dangerous? Find the per capita murder rate
of each to know.
► Per capita rate for City 1 is 2 per 100,000; per
capita rate for City 2 is 4 per 26,000. City 2 is
more dangerous because it has 15.4 murders per
100,000 (4/.26 = 15.38) people compared to City
1’s 2 murders per 100,000.
Standard Deviation
► In
most situations, most people or values will
group toward the middle.
► Those that don’t are different.
► If many group outside the middle, then that tells
you something about the situation – it tells you
that whatever you’re looking at isn’t expected.
Standard Deviation
► For
normal situations,
the “curve” will look
bell-shaped, like this:
Standard Deviation
► Most
healthy women will eat
between 1,700 and 2,000
calories a day. If you plot how
many calories women eat,
each woman’s intake will be
one value. Plot them on a
sheet of paper along a line
and most of the values
(number of calories) will land
in the middle of the spread.
That will be what is called a
“normal distribution.”
Normal distribution
Standard Deviation
► In
a normal distribution, about 68% of the women
will gather in the middle. They are “one standard
deviation” away from the middle on either side.
(The blue area on the graph.)
► Two
standard deviations away will account for
about 95%. (The blue areas and the brown
areas.)
► So, 95% of the values in most situations will be
considered “normal.” However, all but the middle
68% will be somewhat abnormal, but not
excessively abnormal.
► Three
standard deviations away from the middle
will account for about 99% of the values. (The
blue, brown, and green areas). The values in
the green areas are more abnormal, but we
expect about 4% of values to fall into these
areas, because life is not perfect.
Standard Deviation
► If
a scientific study concludes that 99% of the
values fall within three standard deviations, then
you have a normal situation and the conclusions
can be trusted.
► A good public opinion survey, for example, that
concludes Americans support the President’s
policies can be trusted if the values (support for
the president) fall in a normal bell curve with most
of the people saying they support the policies.
► But
what about the situations where the values
don’t fall in a normal bell curve?
► Then
you have untrustworthy results, or at least
you know that more than you would expect don’t fit
the normal pattern. In the graph at top, most of the
values fell to the left of center. In other words, most
of the values are outside the normal range.
Margin of Error
► Margin
of Error deserves better than the throwaway line it gets in the bottom of stories about
polling data. Writers who don't understand
margin of error, and its importance in interpreting
scientific research, can easily embarrass
themselves and their news organizations.
Margin of Error
► The
margin of error is what statisticians call a
confidence interval. The math behind it is much
like the math behind the standard deviation. So you
can think of the margin of error at the 95 percent
confidence interval as being equal to two standard
deviations in your polling sample. Occasionally you
will see surveys with a 99 percent confidence
interval, which would correspond to 3 standard
deviations and a much larger margin of error
because the more you include the fringe, the more
likely your results will be untrustworthy.
Margin of Error
► Let’s
consider a particular week's poll as a
repeat of the previous week's. In the first week,
Candidate A received support from 57% of those
polled. Candidate B received 43%, a 14 point
difference. In the second week, Candidate A
received 53% support and Candidate B received
47%, a 6 point difference. Both polls had a
margin of error of 4 points. So, is Candidate
B gaining on Candidate A?
► No. Statistically, there is no change from the
previous week's poll. Politician B has made up
no measurable ground on Politician A because
the movement for both politicians is within the 4
point margin of error.
Questions Journalists Should Ask
►
Where did the data come from? Always
ask this one first. You always want to know
who did the research that created the data
you're going to write about. Just because a
report comes from a group with a vested
interest in its results doesn't guarantee the
report is a sham. But you should always be
extra skeptical when looking at research
generated by people with a political agenda. At
the least, they have plenty of incentive NOT to
tell you about data they found that contradict
their organization's position.
Questions
► Have
the data been peer-reviewed? If it was,
you know that the data you'll be looking at are at
least minimally reliable because other pollsters
have given their blessing on the data. If it wasn’t,
that’s a sign that it might not be valid data.
Questions
►
How were the data collected? This one is
real important to ask, especially if the data were
not peer-reviewed. If the data come from a
survey, for example, you want to know that the
people who responded to the survey were
selected at random.
Questions
► Be
skeptical when dealing with comparisons.
Researchers like to do something called a
"regression," a process that compares one thing to
another to see if they are statistically related. They
will call such a relationship a "correlation." Always
remember that a correlation DOES NOT mean
causation.
Questions
►
Finally, be aware of numbers taken out of
context. Again, data that are "cherry picked" to
look interesting might mean something else
entirely once it is placed in a different context.
Survey Sample Sizes
► The
population of a study is everyone who could
have been included. For a national poll, then, the
population would include every adult in the U.S. –
a number that would be impractical to poll. Some
researchers take a random sample. The larger the
sample the more likely it will be representative of
the population. But a sample of 400 is usually
good enough for most surveys. Most national
polls, though, survey 1,500 to 2,500 people. The
margin of error in a sample = 1 divided by
the square root of the number of people in
the sample
Survey Sample Sizes
► The
margin of error in a sample = 1 divided by the
square root of the number of people in the sample
► In a survey of 2,500 people, the square root is 50.
So, 1/50 = .02
► In a survey of 400 people, the square root is 20.
So, 1/20 = .05
► This shows the margin of error increases
significantly as the number surveyed decreases.
Picking the Right Statistical Test
► There
are different kinds of stats tests and the
correct one will be the one that provides the best
answers based on the type of data you have
collected.
► It is best to enlist the help of a statistics pro to
analyze your data.
► You can also use SPSS, a computer program that
conducts the statistical computations for you when
you enter the data. So by knowing what type of
test to run, you can enter the data into SPSS and
run the test.
Use of Statistics
► Statistical
tests allow researchers to find out
whether their findings are “significant” – i.e.
What is the probability that what we think is a
relationship between two variables is really just
a chance occurrence? The lower the probability
of chance, the more believable the results.
► Researchers hypothesize. They write a
statement that they believe will be true from the
data they collect. They base this on previous
research and on common sense. Then, they
write the “null hypothesis.” The null is the exact
opposite of the hypothesis the researcher has
chosen. The statistical tests are done to test
whether the null hypothesis is correct. If it is
WRONG, then the researcher’s hypothesis must
be correct.
Use of Statistics
► Researchers
use statistics to determine the
probability of the data being correct. They usually
want a confidence level of .05 and it is written:
p = .05 That means that the data will be 95
percent accurate. (In other words, if the data
were collected 100 more times, the results
would fall within the range of the current
study 95 times.) That means the data are
pretty reliable.
ANOVA
► Most
common statistical test: Analysis of
Variance (ANOVA) is a statistical technique that
is used to compare the means of more than two
groups. There are One-way ANOVA (one
dependent variable and one independent
variable) and Two-way ANOVA (one dependent
and two independent variables). [Note about
variables: the dependent variable (say, choice
of candidate) is what will be affected by the
question or the experiment; the independent
variables are controlled by the researcher (say,
choosing gender or income as factors that affect
the dependent variable – choice of candidate).]
► Use
the ANOVA test only if you are
comparing data from at least 3 groups.
T-test
► Another
common statistical test: t-test uses the
standard deviation of the sample to help
determine interesting stuff about the larger
population.
► Use when you have only 2 groups of data, say
results from men and women and you want to
know whether their answers are significantly
different or just from random chance.
Other types of tests
► There
are many other types of tests for
interpreting data that require a rather high level of
skill in statistics. If your data are complicated and
you want to find out as much about the data as
possible, you may want to consult a stats pro for
help.