density curve

Download Report

Transcript density curve

Lecture 2 – Aug 29
© 2012 W.H. Freeman and Company
Five-number summary and boxplot
6
5
4
3
2
1
6
5
4
3
2
1
6
5
4
3
2
1
6
5
4
3
2
1
6.1
5.6
5.3
4.9
4.7
4.5
4.2
4.1
3.9
3.8
3.7
3.6
3.4
3.3
2.9
2.8
2.5
2.3
2.3
2.1
1.5
1.9
1.6
1.2
0.6
Largest = max = 6.1
BOXPLOT
7
Q3= third quartile
= 4.35
M = median = 3.4
6
Years until death
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
5
4
3
2
1
Q1= first quartile
= 2.2
Smallest = min = 0.6
0
Disease X
Five-number summary:
min Q1 M Q3 max
Boxplots for skewed data
Years until death
Comparing box plots for a normal
and a right-skewed distribution
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
Boxplots remain
true to the data and
depict clearly
symmetry or skew.
Disease X
Multiple Myeloma
6
5
4
3
2
1
6
5
4
3
2
1
6
5
4
3
2
1
6
5
4
3
2
1
7.9
6.1
5.3
4.9
4.7
4.5
4.2
4.1
3.9
3.8
3.7
3.6
3.4
3.3
2.9
2.8
2.5
2.3
2.3
2.1
1.5
1.9
1.6
1.2
0.6
8
7
Q3 = 4.35
Distance to Q3
7.9 − 4.35 = 3.55
6
Years until death
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
5
Interquartile range
Q3 – Q1
4.35 − 2.2 = 2.15
4
3
2
1
Q1 = 2.2
0
Disease X
Individual #25 has a value of 7.9 years, which is 3.55 years above
the third quartile. This is more than 3.225 years, 1.5 * IQR. Thus,
individual #25 is a suspected outlier.
Measure of spread: the standard deviation
The standard deviation “s” is used to describe the variation around the
mean. Like the mean, it is not resistant to skew or outliers.
1. First calculate the variance s2.
n
1
2
s2 
(x
x
)

n1 1 i
x

Mean
±1
s.d.
2. Then take the square root to get
the standard deviation s.
1n
2
s
(
x

x
)

i
n

11
Density curves
A density curve is a mathematical model of a distribution.
The total area under the curve, by definition, is equal to 1, or 100%.
The area under the curve for a range of values is the proportion of all
observations for that range.
Histogram of a sample with the
smoothed, density curve
describing theoretically the
population.
Median and mean of a density curve
The median of a density curve is the equal-areas point: the point that
divides the area under the curve in half.
The mean of a density curve is the balance point, at which the curve
would balance if it were made of solid material.
The median and mean are the same for a symmetric density curve.
The mean of a skewed curve is pulled in the direction of the long tail.
Normal distributions
Normal – or Gaussian – distributions are a family of symmetrical, bellshaped density curves defined by a mean  (mu) and a standard
deviation  (sigma) : N().
2
1x
  
2 
1
f(x)
e
 2

x
e = 2.71828… The base of the natural logarithm
π = pi = 3.14159…
x
A family of density curves
Here, means are the same ( = 15)
while standard deviations are
different ( = 2, 4, and 6).
Here, means are different
( = 10, 15, and 20) while standard
deviations are the
same ( = 3).
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30