The Normal Distribution and Descriptive Statistics

Download Report

Transcript The Normal Distribution and Descriptive Statistics

Kin 304
Descriptive Statistics
& the Normal Distribution
Normal Distribution
Measures of Central Tendency
Measures of Variability
Percentiles
Z-Scores
Arbitrary Scores & Scales
Skewnes & Kurtosis
Descriptive


Describe the characteristics of your sample
Descriptive statistics
–
Measures of Central Tendency

Mean, Mode, Median
–
–
Measures of Variability

–
Skewness, Kurtosis
Standard Deviation & Variance
Percentiles, Scores in comparison to the Normal
distribution
Descriptive Statistics
Normal Frequency Distribution
Mean
Mode
Median
68.26%
34.13% 34.13%
2.15%
13.59%
13.59%
2.15%
95.44%
-4
-3
-2
Descriptive Statistics
-1
0
1
2
3
4
Measures of Central Tendency

When do you use mean, mode or
median?
– Height
– Skinfolds (positively skewed)
– House prices in Vancouver
– Measurements (criterion value)
 Objective
test
 Anthropometry
 Vertical jump
 100m run time
Descriptive Statistics
Measures of Variability
Standard Deviation
 Variance = Standard Deviation2
 Range (approx. = ±3 SDs)
 Nonparametric

–
–
Quartiles (25%ile, 75%ile)
Interquartile distance
Descriptive Statistics
Central Limit Theorem

If a sufficiently large number of random
samples of the same size were drawn from
an infinitely large population, and the mean
(average) was computed for each sample,
the distribution formed by these averages
would be normal.
Descriptive Statistics
Standard Error of the Mean
SD
SEM 
n
Describes how confident you are that the mean of the
sample is the mean of the population
Using the Normal Distribution
Z- Scores
Score = 24
Norm Mean = 30
Norm SD = 4
Z-score = (24 – 30) / 4 = -1.5
Using the Normal Distribution
(X  X )
Z
s
Internal or External Norm
Internal Norm
A sample of subjects are measured. Z-scores are calculated based
upon mean and sd of the sample.
Mean = 0, sd = 1
External Norm
A sample of subjects are measured. Z-scores are calculated based
upon mean and sd of an external normative sample (national,
sport specific etc.)
Mean = ?, sd = ? (depends upon how the sample differs from the
external norm.
Using the Normal Distribution
Standardizing Data
Transforming data into standard scores
Useful in eliminating units of measurements


700
800
600
600
500
400
400
300
200
200
Std. Dev = 6.22
Std. Dev = 1.00
100
Mean = 0.00
Mean = 161.0
N = 5782.00
N = 5782.00
0
Testing Normality
00
4.
50
3.
00
3.
50
2.
00
2.
50
1.
00
1.
Zscore(HT)
HT
0
.5
00
0.
0
6.
18
0
2.
18
0
8.
17
0
4.
17
0
0.
17
0
6.
16
0
2.
16
0
8.
15
0
4.
15
0
0.
15
0
6.
14
0
-.5
0
.0
-1
0
.5
-1
0
.0
-2
0
.5
-2
0
Standardizing Data
Standardizing does not change the distribution of the data.

700
800
600
600
500
400
400
300
200
200
100
Std. Dev = 11.14
Std. Dev = 1.00
Mean = 61.9
Mean = 0.00
N = 5704.00
0
Zscore(WT)
Testing Normality
50
5.
00
5.
50
4.
00
4.
50
3.
00
3.
50
2.
00
2.
50
1.
00
1.
0
.5
00
0.
0
-.5
0
.0
-1
0
.5
-1
0
5.
12 0
0.
12 0
5.
11 0
0.
11 0
5.
10 0
0.
10
.0
95
.0
90
.0
85
.0
80
.0
75
.0
70
.0
65
.0
60
.0
55
.0
50
.0
45
WT
N = 5704.00
0
Z-scores allow measurements from tests with different units to be
combined.
But beware: Higher z-scores are not necessarily better performances
z-scores for
profile A
z-scores for
profile B
Sum of 5 Skinfolds (mm)
1.5
-1.5*
Grip Strength (kg)
0.9
0.9
Vertical Jump (cm)
-0.8
-0.8
Shuttle Run (sec)
1.2
-1.2*
Overall Rating
0.7
-0.65
Variable
*Z-scores are reversed because lower skinfold and
shuttle run scores are regarded as better performances
z-scores
z-scores
-1
0
1
-2
2
Sum of 5
Skinfolds
(mm)
Sum of 5
Skinfolds
(mm)
Grip
Strength
(kg)
Grip
Strength
(kg)
Vertical
Jump
(cm)
Vertical
Jump
(cm)
Shuttle
Run (sec)
Shuttle
Run (sec)
Overall
Rating
Overall
Rating
Test Profile A
-1
0
1
2
Test Profile B
Percentile: The percentage of the population
that lies at or below that score
Mean
Mode
Median
68.26%
34.13% 34.13%
2.15%
13.59%
13.59%
2.15%
95.44%
-4
-3
-2
Descriptive Statistics
-1
0
1
2
3
4
Cumulative Frequency Distribution
-3
-2
-1
Using the Normal Distribution
0
1
2
3
Area under the
Standard Normal Curve
What percentage of the population is
above or below a given z-score or
between two given z-scores?
-4
-3
-2
-1
0
1
2
3
4
Percentage between 0 and -1.5
43.32%
Percentage above -1.5
50 + 43.32% = 93.32%
Predicting Percentiles from Mean and sd
assuming a normal distribution
Predicted
Percentile value
based upon
Mean = 170
Sd = 10
Percentile
Z-score for
Percentile
5
-1,645
153.55
25
-0.675
163.25
50
0
170
75
+0.675
176.75
95
+1.645
186.45
Using the Normal Distribution
Arbitrary Scores & Scales

T-scores
–

Mean = 50,
sd = 10
Hull scores
–
Mean = 50,
sd = 14
Using the Normal Distribution
All Scores based upon z-scores
Z-score
= +1.25
T-Score
= 50 + (+1.25 x 10) = 62.5
Hull Score = 50 + (+1.25 x 14) = 67.5
Z-score
= -1.25
T-Score
= 50 + (-1.25 x 10) = 37.5
Hull Score = 50 + (-1.25 x 14) = 32.5
Using the Normal Distribution
Skewness & Kurtosis


Skewness is a measure of symmetry, or more
accurately, the lack of symmetry. A distribution, or data
set, is symmetric if it looks the same to the left and
right of the center point.
Kurtosis is a measure of whether the data are peaked
or flat relative to a normal distribution. That is, data
sets with a high kurtosis tend to have a distinct peak
near the mean, decline rather rapidly, and have heavy
tails. Data sets with low kurtosis tend to have a flat top
near the mean rather than a sharp peak. A uniform
distribution would be the extreme case
Skewness
Kurtosis
Many variables in measured in BPK are positively skewed
Coefficient of Skewness

skewness 
N
(
X

X
)
i
i 1
( N  1) s
3
3
Where: X = mean, N = sample size, s = standard deviation
Normal Distribution: Skewness = 0
Significant skewness: >1 or <-1
Coefficient of Kurtosis

kurtosis 
N
(
X

X
)
i
i 1
( N  1) s
4
4
Where: X = mean, N = sample size, s = standard deviation
Height (Women)
800
600
400
200
Std. Dev = 6.22
Mean = 161.0
N = 5782.00
0
0
6.
18
0
2.
18
0
8.
17
0
4.
17
0
0.
17
0
6.
16
0
2.
16
0
8.
15
0
4.
15
0
0.
15
0
6.
14
HT
Weight (Women)
700
600
500
400
300
200
Std. Dev = 11.14
100
Mean = 61.9
N = 5704.00
0
0
5.
12 0
0.
12 0
5.
11 0
0.
11 0
5.
10 0
0.
10
.0
95
.0
90
.0
85
.0
80
.0
75
.0
70
.0
65
.0
60
.0
55
.0
50
.0
45
WT
Sum of 5 Skinfolds (Women)
1000
800
600
400
200
Std. Dev = 29.01
Mean = 75.8
N = 5362.00
0
0
0.
22
0
0.
20
0
0.
18
0
0.
16
0
0.
14
0
0.
12
0
0.
10
.0
80
.0
60
.0
40
.0
20
S5SF
Descriptive Statistics
WT
HT
S5SF
Valid N (lis twis e)
N
Statis tic
5704
5782
5362
5347
Mean
Statis tic
Std. Error
61.9210
.1474
161.0457 8.183E-02
75.7820
.3961
Std.
Deviation
Statis tic
11.1361
6.2225
29.0066
Skewness
Statis tic
Std. Error
1.297
.032
.092
.032
1.043
.033
Kurtos is
Statis tic
Std. Error
2.643
.065
.090
.064
1.299
.067
Normal Probability Plots
Correlation of observed with expected cumulative probability is a
measure of the deviation from normal
Normal P-P Plot of HT
Normal P-P Plot of WT
1.00
1.00
.75
.75
.50
.50
Expected Cum Prob

.25
0.00
0.00
.25
.50
Observed Cum Prob
Testing Normality
.75
.25
0.00
1.00
0.00
.25
.50
Observed Cum Prob
.75
1.00
T-scores and Osteoporosis

To diagnose osteoporosis, clinicians measure a
patient’s bone mineral density (BMD) and then express
the patient’s BMD in terms of standard deviations
above or below the mean BMD for a “young normal”
person of the same sex and ethnicity.
T-scores and Osteoporosis
T  score 
( BMD patient  BMD youngnormal )
SDyoungnormal
BMD = bone mineral density

Although they call this standardized score a T-score, it
is really just a Z-score where the reference mean and
standard deviation come from an external population
(i.e., young normal adults of a given sex and ethnicity).
Descriptive Statistics
Classification using t-scores

T-scores are used to classify a patient’s BMD
into one of three categories:
T-scores of  -1.0 indicate normal bone density
– T-scores between -1.0 and -2.5 indicate low bone
mass (“osteopenia”)
– T-scores -2.5 indicate osteoporosis
Decisions to treat patients with osteoporosis
medication are based, in part, on T-scores.
–
