Transcript pn.

Statistics: Purpose, Approach,
Method
The Basic Approach
• The basic principle behind the use of
statistical tests of significance can be
stated as: Compare obtained results to
chance expectation. Another summation
might be: Did you get what you would
expect by chance?
The Basic Approach
• We ask important questions: Does this
obtained result differ significantly from the
theoretically expected result? Does this
obtained result differ from chance
expectation enough to warrant a belief that
something other than chance is at work?
Can the obtained results be explained
solely by chance?
• Statisticians are skeptics. They assume
that results are chance results until shown
to be otherwise.
Definition and Purpose of Statistics
• Statistics is the theory and method of
analyzing quantitative data obtained from
samples of observations in order to study
and compare sources of variance of
phenomena, to help make decisions to
accept or reject hypothesized relations
between the phenomena, and to aid in
drawing reliable inferences from empirical
observations.
Definition and Purpose of Statistics
• The first purpose is to reduce large
quantities of data to manageable and
understandable form.
• A second purpose is to aid in the study of
populations and samples.
• A third purpose of statistics is to aid in
decision making.
• A fourth purpose is to aid in making
reliable inferences from observational data
Definition and Purpose of Statistics
• To summarize much of the above
discussion, the purposes of statistics can
be reduced to one major purpose: to aid in
inference making.
• Statistics says, in effect, “The inference
you have drawn is correct at such-andsuch a level of significance. You may act
as though your hypothesis were true,
remembering that there is such-and-such
a probability that it is untrue.”
Binomial Statistics
• Let two coins be tossed. U={ HH, HT, TH,
TT}. The mean number of heads, or the
expectation of heads, is
• M=2.1/4 + 1 .1/4 + 1 .1/4 + 0 .1/4=1
• This says that if two coins are tossed
many times, the average number of heads
per toss of the two coins is 1.
Binomial Statistics
• In the one-toss experiment, let 1 be assigned if
heads turns up and 0 if tails turn up. Then p(1)=1/2
and p(0)=1-1/2=1/2. In tossing a coin twice, let 1 be
assigned to each head that occurs and 0 to be
each tail. We are interested in the outcome “heads.”
U={ HH, HT, TH, TT}. The mean is
• M=2.1/4 + 1 .1/4 + 1 .1/4 + 0 .1/4=1
• Can we arrive at the same result in an easier
manner? Yes. Just add the means for each
outcome. The mean of the outcome of one coin
toss is ½. For two coin tosses it is ½ + ½ = 1.
Binomial Statistics
• Evidently, M=p, or the mean is equal to the
probability.
• How about a series of outcomes?
• In n trials the mean number of
occurrences of the outcome associated
with p is pn.
The Variance
• V=sum[w(X)(X-M)^2]
• V=npq
The Law of Large Numbers
• Roughly, the law says that with an
increase in the size of sample, n, there is a
decrease in the probability that the
observed value of an event, A, will deviate
from the “true” value of A by no more than
a fixed amount, k. Provided the members
of the samples are drawn independently,
the larger the sample the closer the “true”
proportion value of the population is
approached.
The Law of Large Numbers
• Tchebysheff’s Theorem states that if we
are given a number k that is greater than
or equal to 1 and a set of n measurements,
we are guaranteed (regardless of the
shape of the distribution) that at least (11/k^2) of the measurements will lie within k
standard deviation units on either side of
the mean.
• Table 11.1
The Normal Probability Curve and
the Standard Deviation
• The normal probability curve is the lovely
bell-shaped curve encountered so often in
statistics and psychology textbooks. Its
importance stems from the fact that
chance events in large numbers tend to
distribute themselves in the form of the
curve. The so-called theory of errors uses
the curve. Many phenomena—physical
and psychological—are considered to
distribute themselves in approximately
normal form.
The Normal Probability Curve and
the Standard Deviation
• There are two types of graphs ordinarily used
in behavioral research. One is that the values
of a dependent variable are plotted against
the values of an independent variable. The
other is a graph that shows the distribution of
a single variable.
• When normal distribution does not apply, use
Tchebysheff’s Theorem. With this theorem,
one is guaranteed 75% between Z=-2 and
Z=+2 and 89.9% between Z=-3 and Z=+3.
Interpretation of Data Using the Normal
Probability Curve-Frequency Data
• Instead of calculating exact probabilities,
we can estimate probabilities from
knowledge of the properties of the normal
curve. The normal curve approximation of
the binomial distribution is most useful and
accurate when N is large and the value of
p is close to 0.5.
• The earlier Agree-Disagree problem can
be dealt in three ways. One is chi-square
test, another is the exact probability test,
and the other is through normal curve.
Interpretation of Data Using the Normal
Probability Curve-Continuous Data
• Suppose we have the mathematics test
scores of a sample of 100 fifth-grade
children. The mean of the scores is 70; the
standard deviation is 10. Our interest is in
the reliability of the mean. How much can
we depend on this mean? With future
samples of similar fifth-grade children, will
we get the same mean?
Interpretation of Data Using the Normal
Probability Curve-Continuous Data
• If we calculate a mean and a standard
deviation for each of the many times, we
obtain a gigantic distribution of means
( and standard deviations). The distribution
will form a beautiful bell-shaped normal
curve, even when the original distributions
from which they are calculated are not
normal. This is because we assumed
“other things equal” and thus have no
source of mean fluctuations other than
chance.
Interpretation of Data Using the Normal
Probability Curve-Continuous Data
• Chance errors, given enough of them,
distribute themselves into a normal
distribution. This is the theory called the
theory of errors.
Interpretation of Data Using the Normal
Probability Curve-Continuous Data
• If we had an infinite number of means from an
infinite number of test administrations and
calculated the mean of the means, we would
then obtain the “true” mean. Naturally, we cannot
do that.
• There is fortunately a simple way to solve the
problem. It consists in accepting the mean
calculated from the sample as the “true” mean
and then estimating how accurate this
acceptance (or assumption) is. To do this, a
statistic known as the standard error of the mean
is calculated. It is defined:
Interpretation of Data Using the Normal
Probability Curve-Continuous Data
SEM 
 pop
n
SD
SEM 
n
• Means are reliable with fair-size samples.
It means the standard error of mean is
small enough with a larger sample to
warrant the reliable means.
Interpretation of Data Using the Normal
Probability Curve-Continuous Data
• The standard error of the mean, then, is a
standard deviation. It is a standard deviation of an
infinite number of means. Only chance error
makes the means fluctuate. Thus, the standard
error of the means—or the standard deviation of
the means, if you like—is a measure of chance or
error in its effect on one measure of central
tendency.
• A caution is in order. All the theory discussed here
is based on the assumptions of random sampling
and independence of observations.