Transcript Document
Chapter 1
The mean, the number of
observations, the variance
and the standard deviation
Some definitions
Data -
observations, measurements, scores
Statistics -
a series of rules and methods that can be
used to organize and interpret data.
Descriptive Statistics -
methods to summarize
large amounts of data with just a few numbers or a
figure.
Inferential Statistics -
mathematical procedures
to make statements of a population based on a sample.
More Definitions
Parameter -
a number that summarizes or describes
some aspect of a population.
Sample statistic - An estimate of a population
parameter based on a random sample taken from the
population.
Sampling Error -
the difference between a sample
statistic that estimates a population parameter and the
actual parameter.
Non-parametric Statistics -
statistics for
observations that do not allow the estimation of the
population mean and variance.
Sampling Error - the difference
between a sample statistic that
estimates a population parameter
and the actual parameter.
Differences between sample statistics
and population parameters are
largely a function of stable, random
individual differences and
measurement problems
Where we are going
Descriptive Statistics
Number of Observations
Measures of Central Tendency
Measures of Variability
Observations
Each score is represented by the
letter X.
The total number of observations is
represented by N.
Measures of Central Tendency
Finding the most typical score
median - the middle score
mode - the most frequent score
mean - the average score
In this course, the mean will be our
most important measure of central
tendency
Calculating the Mean
Greek letters are used to represent
population parameters.
(mu) is the mathematical symbol for the
mean.
is the mathematical symbol for summation.
Formula - = (X) / N
English: To calculate the mean, first add
up all the scores, then divide by the
number of scores you added up.
The mode, the median and
the mean
Ages of people retiring from Rutgers this year.
60
63
45
64
Mode is 60.
65
70
55
60
66
45
55
60
60
63 Median is 63.
64
65
66
70
X = 548
N=9
Mean = 60.89
Measures of Variability – less
important
Range - the distance from the highest to the
lowest score.
Inter-quartile Range - the distance from the
top 25% to the bottom 25%.
Sum of Squares (SS) – the total squared
distance of all scores from the mean. You
calculate it by finding the distance of each score
from the mean, squared and then summed over
all the scores. (Note: the more scores, the
bigger the sum of squares.)
Measures of Variability – more
important
Variance (2)- also called sigma2. The variance
is the average squared distance of scores from mu. It is
found by computing the total squared distance of all the
scores from the mean (SS) and then dividing by the
number of scores (2=SS/N)
Standard Deviation ()- also called sigma. The
standard deviation is the square root of the variance.
It is the average unsquared distance of scores in the
population from their mean. (That is almost, but not
exactly like saying that the standard deviation is the
average distance of scores from the population mean.)
Computing the variance
and the standard deviation
Scores on a 10 question
Psychology quiz
Student
X
John
7
Jennifer
8
Arthur
3
Patrick
5
Marie
7
X = 30
N=5
= 6.00
X-
+1.00
+2.00
-3.00
-1.00
+1.00
(X- ) = 0.00
(X - )2
1.00
4.00
9.00
1.00
1.00
(X- )2 = SS = 16.00
2 = SS/N = 3.20
= 3.20 = 1.79
The variance is our most basic and
important measure of variability
The variance ( =sigma squared) is the
average squared distance of individual scores
from the population mean.
Other indices of variation are derived from the
variance.
For example,. as noted above, sigma is the
average unsquared distance of scores from mu
is the standard deviation. To find it, you
compute the square root of the variance.
2
Other measures of variability
derived from the variance
We can randomly choose scores from a population to
form a random sample and then find the mean of such
samples.
Each score you add to a sample tends to correct the
sample mean back toward the population mean, mu.
The average squared distance of sample means from
the population mean is the variance divided by n, the
size of the sample.
To find the average unsquared distance of sample
means from mu divide the variance by n, then take the
square root of the resulting number. This final result is
called the standard error of the sample mean or, more
briefly, the standard error of the mean. We’ll see
more of this in Ch. 4.
Making predictions (1)
Without any other information, the population
mean (mu) is the best prediction of each and
every person’s score.
So you should predict that everyone will score
precisely at the population mean.
Why? Because the mean is an unbiased
predictor or estimate. The mean is as close to
the high as to the low scores in the population.
This is mathematically proven by the fact that
deviations around the mean sum to zero.
You should also predict
that everyone will score
right at the mean because:
The mean is the number that is the
smallest average squared distance from all
the scores in the distribution.
Thus, the mean is your best prediction,
because it is a least squares, unbiased
predictor.
What happens if we make
a prediction other than mu.
Scores on a Psychology quiz (mu = 6.00) What if we predict
everyone will score 5.50? Deviations don’t sum to zero and the
average squared distance of scores from the prediction
increases
Student
X
John
7
Jennifer
8
Arthur
3
Patrick
5
Marie
7
X = 30
N=5
= 6.00
X
X -- 5.5
5.50
+1.50
+2.50
-2.50
-0.50
+1.50
(X- ?) = 2.50
(X(X- 5.50)
- )2 2
2.25
6.25
6.25
0.25
2.25
(X- ?)2 = SS = 17.25
2 = SS/N = 3.45
= 3.20 = 1.86
Compare that to predicting that everyone
will score right at the mean (mu).
Scores on a 10 question
Psychology quiz
Student
X
John
7
Jennifer
8
Arthur
3
Patrick
5
Marie
7
X = 30
N=5
= 6.00
X-
+1.00
+2.00
-3.00
-1.00
+1.00
(X- ) = 0.00
(X - )2
1.00
4.00
9.00
1.00
1.00
(X- )2 = SS = 16.00
2 = SS/N = 3.20
= 3.20 = 1.79
Mu vs. another prediction
Prediction = 5.50
Deviations don’t sum to zero. It’s a biased
prediction.
Sum of squares = 17.25
Prediction = mu = 6.00
Deviations sum to zero. It’s unbiased
Sum of squares = 16.00
So you should predict everyone will score
precisely at the mean.
But when you predict that everyone will
score at the mean, you will be wrong. In fact,
it is often the case that no one will score
precisely at the mean.
In statistics, we don’t expect our predictions to
be precisely right.
We want to make predictions that are wrong in
a particular way.
We want our predictions to be as close to the
high scores as to the low scores in the
population.
The mean is the only number that is an
unbiased predictor, it is the only number around
which deviations sum to zero.
We want to be wrong by
the least amount possible
In statistics, we consider error to be the
squared distance between a prediction
and the actual score.
Sum of squares is total amount prediction
is wrong. Variance is average amount
prediction is wrong
score right at the mean
(mu).
The mean is the least average squared distance
from all the scores in the population.
The number that is the least average squared
distance from the scores in the population is the
prediction that is least wrong, the least in error.
Thus, saying that everyone will score at the
mean (even if no one does!) is the prediction
that gives you the smallest amount of error.
Why doesn’t everyone
score right at the mean?
Sources of Error
Individual differences – people have stable
differences from one another. They differ in
an infinite number of ways and combination
of ways.
PROOF OF THAT: AREN’T YOU ARE MORE
LIKE WHO YOU WILL BE IN 5 MINUTES
THAN YOU ARE LIKE THE PERSON NEXT TO
YOU??!
AND – THERE ARE ALWAYS
MEASUREMENT
PROBLEMS!
Instruments are imperfect,
scores get mistranscribed,
participants may be
uninterested or have a
stomach ache, etc. etc.
etc. …
Remember: THERE ARE
ALWAYS MEASUREMENT
PROBLEMS
NO MEASUREMENT DEVICE IS EVER
PERFECTLY ACCURATE, WHETHER IT IS
A HIGHLY ACCURATE SCALE OR A 12
QUESTION QUESTIONNAIRE
Additionally, transient situational
factors make measurement inaccurate
This is especially true when we measure
people. Let’s say we are measuring
something relatively easy to measure, such as
verbal ability. When we are measuring
people, lots of transient factors (such as
mood, events, time, motivation etc.) all
change an individual’s responses and combine
to make our measurement of verbal ability
imperfect.
The mean square for error
We call the average squared error of
prediction when we use the mean as our
prediction the “mean square for error”. It
tells us how much (squared) error we
make, on the average, when we predict
that everyone will score precisely at the
mean.
Mean square for error =
the variance (sigma2)
If we predict that everyone will score
right at the mean, how much error
do you make on the average? To find
out, find the distance of each score
from the mean, square that distance
and divide by the number of scores
to find the average error.
Let’s see an example.
Mean square for error =
the variance (sigma2)
If we predict that everyone will score
right at the mean, how much error
do you make on the average? To find
out, find the distance of each score
from the mean, square that distance
and divide by the number of scores
to find the average error.
Let’s see an example.
Mean square for error; Example
Let’s say we are interested in a population of 5
scores from a 10 question multiple choice test.
John scores 7, Jennifer scores 8, Arthur scores
3, Patrick scores 5 and Marie scores 7.
That’s a total of 30, N=5, so mu = 6.00
We predict that everyone will score 6.00.
On the average, how wrong are we (in squared
units)?
Compare that to predicting that everyone
will score right at the mean (mu).
Scores on a 10 question
Psychology quiz
Student
X
John
7
Jennifer
8
Arthur
3
Patrick
5
Marie
7
X = 30
N=5
= 6.00
X-
(X - )2
+1.00
1.00
+2.00
4.00
-3.00
9.00
-1.00
1.00
+1.00
1.00
(X- ) = 0.00
(X- )2 = SS = 16.00
Mean square for error = SS/N = 3.20
Whoops! Wait a minute. We just
computed sigma2 again.
Scores on a 10 question
Psychology quiz
Student
X
John
7
Jennifer
8
Arthur
3
Patrick
5
Marie
7
X = 30
N=5
= 6.00
X-
+1.00
+2.00
-3.00
-1.00
+1.00
(X- ) = 0.00
(X - )2
1.00
4.00
9.00
1.00
1.00
(X- )2 = SS = 16.00
2 = SS/N = 3.20
= 3.20 = 1.79
Mean square for error =
the variance (sigma2)
The best prediction we can make is that
everyone will score right at the mean.
When we make that prediction the average
amount of (squared) that we are wrong is the
mean square for error ((ALSO CALLED SIGMA2
OR THE VARIANCE).
When you have three names for something you
know it has to be important.
END CHAPTER 1 SLIDES