Chapter 1

Transcript Chapter 1

Chapter 1
The mean, the number of
observations, the variance
and the standard deviation
Some definitions
Data -
observations, measurements, scores
Statistics -
a series of rules and methods that can be
used to organize and interpret data.
Descriptive Statistics -
methods to summarize
large amounts of data with just a few numbers.
Inferential Statistics -
mathematical procedures
to make statements of a population based on a sample.
More Definitions
Parameter -
a number that summarizes or describes
some aspect of a population.
Sampling Error -
the difference between a statistic
and its parameter.
Non-parametric Statistics -
statistics for
observations that are discrete, mutually exclusive, and
exhaustive.
Where we are going
Descriptive Statistics
Number of Observations
Measures of Central Tendency
Measures of Variability
Observations
Each score is represented by the
letter X.
The total number of observations is
represented by N.
Measures of Central Tendency
Finding the most typical score
median - the middle score
mode - the most frequent score
mean - the average score
Calculating the Mean
Greek letters are used to represent
population parameters.
 (mu) is the mathematical symbol for the
mean.
 is the mathematical symbol for summation.
Formula -  = (X) / N
English: To calculate the mean, first add
up all the scores, then divide by the
number of scores you added up.
The mode, the median and
the mean
Ages of people retiring from Rutgers this year.
60
63
45
63
Mode is 63.
65
70
55
60
65
63
45
55
60
60
63 Median is 63.
63
63
65
65
70
X = 609
N = 10
Mean  = 60.90
Measures of Variability
Range - the distance from the highest to the lowest
score.
Inter-quartile Range - the distance from the top
25% to the bottom 25%.
Sum of Squares (SS) - the distance of each score
from the mean, squared and then summed.
Variance (2)- the average squared distance of scores
from mu (SS/N)
Standard Deviation ()- the square root of the
variance.
Computing the variance
and the standard deviation
Scores on a 10 question
Psychology quiz
Student
X
John
7
Jennifer
8
Arthur
3
Patrick
5
Marie
7
X = 30
N=5
 = 6.00
X-
+1.00
+2.00
-3.00
-1.00
+1.00
(X- ) = 0.00
(X - )2
1.00
4.00
9.00
1.00
1.00
(X- )2 = SS = 16.00
2 = SS/N = 3.20
 = 3.20 = 1.79
The variance is our most basic and
important measure of variability
The variance (  =sigma squared) is the
average squared distance of individual scores
from the population mean.
Other indices of variation are derived from the
variance.
The average unsquared distance of scores from
mu is the standard deviation. To find it you
compute the square root of the variance.
2
Other measures of variability
derived from the variance
 We can randomly choose scores from a population to
form a random sample and then find the mean of such
samples.
 Each score you add to a sample tends to correct the
sample mean back toward the population mean, mu.
 The average squared distance of sample means from
the population mean is the variance divided by n, the
size of the sample.
 To find the average unsquared distance of sample
means from mu divide the variance by n, then take the
square root. The result is called the standard error of
the sample mean or, more briefly, the standard error of
the mean. We’ll see more of this in Ch. 4.
Making predictions (1)
Without any other information, the
population mean (mu) is the best
prediction of each and every person’s
score.
So you should predict that everyone will
score precisely at the population mean.
Why? Because the mean is an unbiased
predictor or estimate (that is, the
deviations around the mean sum to zero).
Making predictions (2)
The mean is precisely the number that is
the smallest squared distance on the
average from the other numbers in the
distribution.
Thus, the mean is your best prediction,
because it is a least squares, unbiased
predictor.
What happens if we make
a prediction other than mu.
Scores on a Psychology quiz (mu = 6.00) What
happens if we predict everyone will score 5.50?
Student
X
John
7
Jennifer
8
Arthur
3
Patrick
5
Marie
7
X = 30
N=5
 = 6.00
X
X -- 5.5
5.50
+1.50
+2.50
-2.50
-0.50
+1.50
(X- ?) = 2.50
(X(X- 5.50)
- )2 2
2.25
6.25
6.25
0.25
2.25
(X- ?)2 = SS = 17.25
2 = SS/N = 3.45
 = 3.20 = 1.86
Compare that to predicting that everyone
will score right at the mean (mu).
Scores on a 10 question
Psychology quiz
Student
X
John
7
Jennifer
8
Arthur
3
Patrick
5
Marie
7
X = 30
N=5
 = 6.00
X-
+1.00
+2.00
-3.00
-1.00
+1.00
(X- ) = 0.00
(X - )2
1.00
4.00
9.00
1.00
1.00
(X- )2 = SS = 16.00
2 = SS/N = 3.20
 = 3.20 = 1.79
But when you predict that everyone will
score at the mean, you will be wrong. In fact,
it is often the case that no one will score
precisely at the mean.
In statistics, we don’t expect our predictions to
be precisely right.
We want to make predictions that are wrong in
a particular way.
We want our predictions to be as close to the
high scores as to the low scores in the
population.
The mean is the only number that is an
unbiased predictor, it is the only number around
which deviations sum to zero.
We want to be wrong by
the least amount possible
In statistics, we consider error to be the
squared distance between a prediction and the
actual score.
The mean is the least average squared distance
from all the scores in the population.
The number that is the least average squared
distance from the scores in the population is the
prediction that is least wrong, the least in error.
Thus, saying that everyone will score at the
mean (even if no one does!) is the prediction
that gives you the smallest amount of error.
So the mean is the best
prediction of everyones’
score because it is a least
squares, unbiased
predictor for all the scores
in the population.
Why doesn’t everyone
score right at the mean?
Sources of Error
Individual differences
Measurement problems
If we predict that everyone will score right at the mean,
how much error do you make on the average? To find
out, find the distance of each score from the mean,
square that distance and divide by the number of scores
to find the average error. WHOOPS: THAT’S SIGMA2.
Mean square for error = variance
Questions and answers – the
mean.
 WHAT QUALITIES OF THE MEAN (MU) MAKE IT THE
BEST PREDICTION YOU CAN MAKE OF WHERE
EVERYONE WILL SCORE?
 The mean is an unbiased predictor or estimate, because
the deviations around the mean always sum to zero.
 The mean is a least squares predictor because it is the
smallest squared distance on the average from all the
scores in the population.
Q & A: the mean
WHY WOULD YOU PREDICT THAT EVERYONE
WILL SCORE AT THE MEAN WHEN, IN FACT,
OFTEN NO ONE CAN POSSIBLY SCORE
PRECISELY AT THE MEAN?
In statistics, we don’t expect our predictions to
be precisely right.
We want to make predictions that are close and
wrong in a particular way.
We want least squares, unbiased predictors.
Q & A: The variance
WHAT ARE THE OTHER NAMES FOR THE
VARIANCE?
Sigma2 and the mean square for error.
WHAT OTHER MEASURES OF
VARIABILITY CAN BE EASILY COMPUTED
ONCE YOU KNOW THE VARIANCE?
The standard deviation and the standard
error of the sample mean.
How do you compute
THE VARIANCE? Find the distance of each score
from the mean, square it, sum them up and
divide by the number of scores in the
population.
THE STANDARD DEVIATION? Compute the
square root of the variance.
THE STANDARD ERROR OF THE SAMPLE MEAN?
Divide the variance by n, the size of the sample,
and then take a square root.

Chapter 1

Transcript Chapter 1

Directory