Transcript Lecture 6
Lecture 7:
Bivariate Statistics
Properties of Standard Deviation
Variance is just the square of
the S.D.
If a constant is added to all
scores, it has no impact on S.D.
If a constant is multiplied to all
scores, it will affect the
dispersion (S.D. and variance)
S = standard deviation
X = individual score
M = mean of all scores
n = sample size (number
of scores)
2
3
Distributions and Standard
Deviations
Example: A distribution has a mean of 40
and a standard deviation of 5. 68% of the
distribution can be found between what
two values?
95% of the distribution can be found
between what two values?
4
Standard Error of the Mean
Standard Error is an estimate of how much
the mean would vary over many samples
drawn from the same population.
It is calculated from a single sample– it is an
estimate of the standard deviation of the
sampling distribution of the mean.
Smaller S.E. suggests that our sample is likely a
good estimate of the population mean.
s
SEM
N
5
Common Data Representations
Histograms
Simple graphs of the frequency of groups of scores.
Stem-and-Leaf Displays
Another way of displaying dispersion, particularly
useful when you do not have large amounts of data.
Box Plots
Yet another way of displaying dispersion. Boxes show
75th and 25th percentile range, line within box shows
median, and “whiskers” show the range of values
(min and max)
6
Estimation and Hypothesis Tests:
The Normal Distribution
A key assumption for many variables (or
specifically, their scores/values) is that
they are normally distributed.
In large part, this is because the most
common statistics (chi-square, t, F test)
rest on this assumption.
7
Why do we make this assumption?
Central Limit Theorem
Errors can be viewed as a sum of many
independent random effects, thus individual scores
will tend to be normally distributed.
Even if Y is not normally distributed, the
distribution of the sample mean will tend to be
normal as the sample size increases.
Y=µ+ε
A given score (Y) is the sum of the mean of
the population (µ) and some error (ε)
8
The z-score
Infinitely many normal
distributions are possible, one
for each combination of mean
and variance– but all related to
a single distribution.
Standardizing a group of
scores changes the scale to
one of standard deviation
units.
z
Y
Allows for comparisons with
scores that were originally on a
different scale.
9
z-scores (continued)
Tells us where a score is located within a
distribution– specifically, how many
standard deviation units the score is above
or below the mean.
Properties
The mean of a set of z-scores is zero (why?)
The variance (and therefore standard
deviation) of a set of z-scores is 1.
10
Area under the normal curve
Example, you have a variable x with mean
of 500 and S.D. of 15. How common is a
score of 525?
Z = 525-500/15 = 1.67
If we look up the z-statistic of 1.67 in a z-score
table, we find that the proportion of scores
less than our value is .9525.
Or, a score of 525 exceeds .9525 of the
population. (p < .05)
Z-Score Calculator
11
Issues with Normal Distributions
Skewness
Kurtosis
12
Correlation
Hypothesis testing an association
between two metric variables
Checking for simple linear
relationships
Pearson’s correlation coefficient
Measures the extent to which two variables
are linearly related
Basically, the correlation coefficient is the
average of the cross products of the
corresponding z-scores.
14
Correlations
Ranges from zero to 1, where 1 = perfect
linear relationship between the two
variables.
Remember: correlation ONLY measures
linear relationships, not all relationships!
N
1
rxy
z xi z yi
N 1 i 1
15
Correlation Example
General Social Survey 1993
Education and Age
16
The t-test
Hypothesis testing for the equality
of means between two
independent groups
Alternative Hypotheses Revisited
Alternative Hypotheses:
H1: μ1 < μc
H0: μ1 > μc
H0: μ1 ≠ μc
How do we test to see if the means
between two sample populations are, in
fact, different?
18
The t-test
Where:
M = mean
SDM = Standard error of the difference between means
N = number of subjects in group
s = Standard Deviation of group
df = degrees of freedom
19
Degrees of freedom
d.f. = the number of independent pieces of
information from the data collected in a study.
Example: Choosing 10 numbers that add up to 100.
This kind of restriction is the same idea: we had
10 choices but the restriction reduced our
independent selections to N-1.
In statistics, further restrictions reduce the degrees
of freedom.
In the t-test, since we deal with two means, our
degrees of freedom are reduced by two.
20
Z-distribution versus t-distribution
21
t distribution
As the degrees of freedom increase (towards
infinity), the t distribution approaches the z
distribution (i.e., a normal distribution)
Because N plays such a prominent role in the
calculation of the t-statistic, note that for very
large N’s, the sample standard deviation (s)
begins to closely approximate the population
standard deviation (σ)
22
Assumptions Underlying the
Independent Sample t-test
Assumption of Normality
Assumption of Homogeneity of Variance
The outputs for the t-test in SPSS correspond
to the standard t-test (equal variance
assumed) and a separate variance t-test
(equal variance not assumed)
23
Practical Example:
Do men and women watch different
amounts of TV per week?
General Social Survey 1993
24