Transcript Document

Chapter 1
The mean, the number of
observations, the variance
and the standard deviation
Some definitions
Data -
observations, measurements, scores
Statistics -
a series of rules and methods that can be
used to organize and interpret data.
Descriptive Statistics -
methods to summarize
large amounts of data with just a few numbers.
Inferential Statistics -
mathematical procedures
to make statements of a population based on a sample.
More Definitions
Parameter -
a number that summarizes or describes
some aspect of a population.
Sample statistic - An estimate of a population
parameter based on a random sample taken from the
population.
Sampling Error -
the difference between a sample
statistic that estimates a population parameter and the
actual parameter.
Non-parametric Statistics -
statistics for
observations that do not allow the estimation of the
population mean and variance.
Sampling Error - the difference
between a sample statistic that
estimates a population parameter
and the actual parameter.
Differences between sample statistics
and population parameters are
largely a function of stable, random
individual differences and
measurement problems
Where we are going
Descriptive Statistics
Number of Observations
Measures of Central Tendency
Measures of Variability
Observations
Each score is represented by the
letter X.
The total number of observations is
represented by N.
Measures of Central Tendency
Finding the most typical score
median - the middle score
mode - the most frequent score
mean - the average score
In this course, the mean will be our
most important measure of central
tendency
Calculating the Mean
Greek letters are used to represent
population parameters.
 (mu) is the mathematical symbol for the
mean.
 is the mathematical symbol for summation.
Formula -  = (X) / N
English: To calculate the mean, first add
up all the scores, then divide by the
number of scores you added up.
The mode, the median and
the mean
Ages of people retiring from Rutgers this year.
60
63
45
64
Mode is 60.
65
70
55
60
66
45
55
60
60
63 Median is 63.
64
65
66
70
X = 548
N=9
Mean  = 60.89
Measures of Variability – less
important
Range - the distance from the highest to the
lowest score.
Inter-quartile Range - the distance from the
top 25% to the bottom 25%.
Sum of Squares (SS) – the total squared
distance of all scores from the mean. You
calculate it by finding the distance of each score
from the mean, squared and then summed over
all the scores.
Measures of Variability – more
important
Variance (2)- also called sigma2. The variance
is the average squared distance of scores from mu. It is
found by dividing the total squared distance of all the
scores from the mean and then dividing by the number
of scores (2=SS/N)
Standard Deviation ()- also called sigma. The
standard deviation is the square root of the variance.
It is the average unsquared distance of scores in the
population from their mean. (That is almost, but not
exactly like saying that the standard deviation is the
average distance of scores from the population mean.)
Computing the variance
and the standard deviation
Scores on a 10 question
Psychology quiz
Student
X
John
7
Jennifer
8
Arthur
3
Patrick
5
Marie
7
X = 30
N=5
 = 6.00
X-
+1.00
+2.00
-3.00
-1.00
+1.00
(X- ) = 0.00
(X - )2
1.00
4.00
9.00
1.00
1.00
(X- )2 = SS = 16.00
2 = SS/N = 3.20
 = 3.20 = 1.79
The variance is our most basic and
important measure of variability
The variance (  =sigma squared) is the
average squared distance of individual scores
from the population mean.
Other indices of variation are derived from the
variance.
For example,. as noted above, sigma is the
average unsquared distance of scores from mu
is the standard deviation. To find it you compute
the square root of the variance.
2
Other measures of variability
derived from the variance
 We can randomly choose scores from a population to
form a random sample and then find the mean of such
samples.
 Each score you add to a sample tends to correct the
sample mean back toward the population mean, mu.
 The average squared distance of sample means from
the population mean is the variance divided by n, the
size of the sample.
 To find the average unsquared distance of sample
means from mu divide the variance by n, then take the
square root. The result is called the standard error of
the sample mean or, more briefly, the standard error
of the mean. We’ll see more of this in Ch. 4.
Making predictions (1)
Without any other information, the population
mean (mu) is the best prediction of each and
every person’s score.
So you should predict that everyone will score
precisely at the population mean.
Why? Because the mean is an unbiased
predictor or estimate. The mean is as close to
the high as to the low scores in the population.
This is mathematically proven by the fact that
deviations around the mean sum to zero.
You should also predict
that everyone will score
right at the mean because:
The mean is the number that is the
smallest average squared distance from all
the scores in the distribution.
Thus, the mean is your best prediction,
because it is a least squares, unbiased
predictor.
What happens if we make
a prediction other than mu.
Scores on a Psychology quiz (mu = 6.00) What if we predict
everyone will score 5.50? Deviations don’t sum to zero and the
average squared distance of scores from the prediction
increases
Student
X
John
7
Jennifer
8
Arthur
3
Patrick
5
Marie
7
X = 30
N=5
 = 6.00
X
X -- 5.5
5.50
+1.50
+2.50
-2.50
-0.50
+1.50
(X- ?) = 2.50
(X(X- 5.50)
- )2 2
2.25
6.25
6.25
0.25
2.25
(X- ?)2 = SS = 17.25
2 = SS/N = 3.45
 = 3.20 = 1.86
Compare that to predicting that everyone
will score right at the mean (mu).
Scores on a 10 question
Psychology quiz
Student
X
John
7
Jennifer
8
Arthur
3
Patrick
5
Marie
7
X = 30
N=5
 = 6.00
X-
+1.00
+2.00
-3.00
-1.00
+1.00
(X- ) = 0.00
(X - )2
1.00
4.00
9.00
1.00
1.00
(X- )2 = SS = 16.00
2 = SS/N = 3.20
 = 3.20 = 1.79
But when you predict that everyone will
score at the mean, you will be wrong. In fact,
it is often the case that no one will score
precisely at the mean.
In statistics, we don’t expect our predictions to
be precisely right.
We want to make predictions that are wrong in
a particular way.
We want our predictions to be as close to the
high scores as to the low scores in the
population.
The mean is the only number that is an
unbiased predictor, it is the only number around
which deviations sum to zero.
We want to be wrong by
the least amount possible
In statistics, we consider error to be the
squared distance between a prediction and the
actual score.
The mean is the least average squared distance
from all the scores in the population.
The number that is the least average squared
distance from the scores in the population is the
prediction that is least wrong, the least in error.
Thus, saying that everyone will score at the
mean (even if no one does!) is the prediction
that gives you the smallest amount of error.
Why doesn’t everyone
score right at the mean?
Sources of Error
Individual differences – people have stable
differences from one another. They differ in
an infinite number of ways and combination
of ways.
PROOF OF THAT: AREN’T YOU ARE MORE
LIKE WHO YOU WILL BE IN 5 MINUTES
THAN YOU ARE LIKE THE PERSON NEXT TO
YOU??!
AND – THERE ARE ALWAYS
MEASUREMENT
PROBLEMS!
Instruments are imperfect,
scores get mistranscribed,
participants may be
uninterested or have a
stomach ache, etc. etc.
etc. …
Remember: THERE ARE
ALWAYS MEASUREMENT
PROBLEMS
NO MEASUREMENT DEVICE IS EVER
PERFECTLY ACCURATE, WHETHER IT IS
A HIGHLY ACCURATE SCALE OR A 12
QUESTION QUESTIONNAIRE
Additionally, transient situational
factors make measurement inaccurate
This is especially true when we measure
people. Let’s say we are measuring
something relatively easy to measure, such as
verbal ability. When we are measuring
people, lots of transient factors (such as
mood, events, time, motivation etc.) all
change an individual’s responses and combine
to make our measurement of verbal ability
imperfect.
The mean square for error
We call the average squared error of
prediction when we use the mean as our
prediction the “mean square for error”. It
tells us how much (squared) error we
make, on the average, when we predict
that everyone will score precisely at the
mean.
Mean square for error =
the variance (sigma2)
If we predict that everyone will score
right at the mean, how much error
do you make on the average? To find
out, find the distance of each score
from the mean, square that distance
and divide by the number of scores
to find the average error.
WHOOPS: THAT’S SIGMA2.
Questions and answers – the
mean.
 WHAT QUALITIES OF THE MEAN (MU) MAKE IT THE
BEST PREDICTION YOU CAN MAKE OF WHERE
EVERYONE WILL SCORE?
 The mean is an unbiased predictor or estimate, because
the deviations around the mean always sum to zero.
 The mean is a least squares predictor because it is the
smallest squared distance on the average from all the
scores in the population.
So the variance has a third
name.
The variance is called the mean square for
error as well as being called sigma2.
As the mean square for error, the variance
is our numerical index of the effects of
individual differences and measurement
problems.
Q & A: the mean
WHY WOULD YOU PREDICT THAT EVERYONE
WILL SCORE AT THE MEAN WHEN, IN FACT,
OFTEN NO ONE CAN POSSIBLY SCORE
PRECISELY AT THE MEAN?
In statistics, we don’t expect our predictions to
be precisely right.
We want to make predictions that are close and
wrong in a particular way.
We want least squares, unbiased predictors.
Q & A: The variance
WHAT ARE THE OTHER NAMES FOR THE
VARIANCE?
Sigma2 and the mean square for error.
WHAT OTHER MEASURES OF
VARIABILITY CAN BE EASILY COMPUTED
ONCE YOU KNOW THE VARIANCE?
The standard deviation and the standard
error of the sample mean.
How do you compute
THE VARIANCE? Find the distance of each score
from the mean, square it, sum them up and
divide by the number of scores in the
population.
THE STANDARD DEVIATION? Compute the
square root of the variance.
THE STANDARD ERROR OF THE SAMPLE MEAN?
Divide the variance by n, the size of the sample,
and then take a square root.
END CHAPTER 1 SLIDES