Means and Variances
Download
Report
Transcript Means and Variances
Seminar 15 | Tuesday, October 18, 2007 | Aliaksei Smalianchuk
Means and Variances
What happens to means and variances
when data is manipulated?
Let’s check by manipulating data from
the survey.
Data
Height in inches (HT)
Shoe size (Shoe)
Age (Age)
Additional Columns:
Height with a 1 inch heel (HeightPlus1)
Height in centimeters (2.5TimesHeight)
Sum of height and shoe size
(HeightPlusShoe)
Sum of height and age (HeightPlusAge)
Statistics
Variable
HT
Shoe
Age
HeightPlus1
2.5TimesHeight
HeightPlusShoe
HeightPlusAge
N
444
445
444
444
444
444
444
Mean
66.928
9.1056
20.371
67.928
167.32
76.035
87.299
StDev
3.938
1.9484
2.912
3.938
9.84
5.693
4.913
Observation 1
Variable
HT
Shoe
Age
HeightPlus1
2.5TimesHeight
HeightPlusShoe
HeightPlusAge
N
444
445
444
444
444
444
444
Mean
66.928
9.1056
20.371
67.928
167.32
76.035
87.299
StDev
3.938
1.9484
2.912
3.938
9.84
5.693
4.913
The mean of heel heights is one inch larger than then mean of heights
Why?
If every element is modified by a
constant number the mean follows the
same pattern.
Observation 2
Variable
HT
Shoe
Age
HeightPlus1
2.5TimesHeight
HeightPlusShoe
HeightPlusAge
N
444
445
444
444
444
444
444
Mean
66.928
9.1056
20.371
67.928
167.32
76.035
87.299
StDev
3.938
1.9484
2.912
3.938
9.84
5.693
4.913
The standard deviation of heel heights equals the standard deviation of heights
Why?
Standard deviation is relative to the
mean, and the shape of the distribution
didn’t change
Observation 3
Variable
HT
Shoe
Age
HeightPlus1
2.5TimesHeight
HeightPlusShoe
HeightPlusAge
N
444
445
444
444
444
444
444
Mean
66.928
9.1056
20.371
67.928
167.32
76.035
87.299
The standard deviation of heights is 2.5 times
the standard deviation of heights in centimeters
StDev
3.938
1.9484
2.912
3.938
9.84
5.693
4.913
Why?
By multiplying all data values by a constant value we are increasing
the spread of the histogram by the same value, therefore modifying
the properties that depend on the spread (like standard deviation.)
Observation 4
Variable
HT
Shoe
Age
HeightPlus1
2.5TimesHeight
HeightPlusShoe
HeightPlusAge
N
444
445
444
444
444
444
444
Mean
66.928
9.1056
20.371
67.928
167.32
76.035
87.299
StDev
3.938
1.9484
2.912
3.938
9.84
5.693
4.913
Mean of HeightPlusShoe = Mean of Height + Mean of Shoe
Observation 5
Variable
HT
Shoe
Age
HeightPlus1
2.5TimesHeight
HeightPlusShoe
HeightPlusAge
N
444
445
444
444
444
444
444
Mean
66.928
9.1056
20.371
67.928
167.32
76.035
87.299
Mean of HeightPlusAge = Mean of Height + Mean of Age
StDev
3.938
1.9484
2.912
3.938
9.84
5.693
4.913
Why?
Since
Variances
Variance = σ2
Variances apply to a probability
distribution
Variance is a way to capture the degree
of spread of a distribution
Variances
Variable
HT
Shoe
Age
HeightPlusShoe
HeightPlusAge
Variance
15.50784
3.796263
8.479744
32.41025
24.13757
Dependence
Are shoe sizes and heights dependent?
Are age and height dependent?
Let’s check using scatter plots
Height vs. Shoe Size
Height vs. Age
Back to variances
Variable
HT
Shoe
Age
HeightPlusShoe
HeightPlusAge
Variance
15.50784
3.796263
8.479744
32.41025
24.13757
Variance of HeightPlusShoe is much
greater than Var(Height) + Var(Shoe)
Variance of HeightPlusAge is very close
to Var(Height) + Var(Age)
Why?
Can you see a difference in relationships
(Height vs. Shoe Size) and (Height vs.
Age?)
Dependence
Adding two dependent data distributions
produces extremes (adding small values
with corresponding small values and
adding large values to correspondent
large values)
This makes the variance much larger.
Dependence
In case of independent sets, values do
not necessarily correspond by relative
value (large values can be added to
small values)
This does not alter the spread of the
distribution much
Variance of sample mean
Mean = (X1 + X2 + … + Xn)/n
Variance [(X1 + X2+ … +Xn)/n] =
(Variance[X1] + Variance[X2]+ … +
Variance[Xn])/n
Dependence?
Would this work for dependent values of
X 1, X 2 … X n ?
Would the variance produced by this
formula be larger or smaller than actual?
Sampling without replacement
Would the variance formula hold true?
Why?
Dependence
Adding variances of dependent values
will produce a smaller result than
expected because adding dependent
data sets will produce extremes, altering
the spread
Sampling without replacement on
smaller populations (n < 10) will produce
dependence
The End
Extra Credit (Dr. Pfenning)
Use Minitab Calculator to create column
“Birthyear”
Plot Earned vs. Birthyear, note relationship
Create column “EarnedPlusBirthyear”
Find sds of Earned, Birthyear,
EarnedPlusBirthyear, square to variances
Compare variances
Explain results