JOINT AND CONDITIONAL DISTRIBUTIONS

Download Report

Transcript JOINT AND CONDITIONAL DISTRIBUTIONS

EXPECTATION, VARIANCE
ETC. - APPLICATION
1
Measures of Central Location
• Usually, we focus our attention on two
types of measures when describing
population characteristics:
– Central location
– Variability or spread
2
Measures of Central Location
• The measure of central location reflects
the locations of all the data points.
• How?
With two data points,
the central location
But
if
the
third data
With one data point
should
fall inpoint
the middle
on the leftthem
hand-side
clearly the centralappears between
(in order
of
the
midrange,
it
should
“pull”of
location is at the point to reflect the location
the central
location
to the left.
itself.
both
of them).
3
The Arithmetic Mean
• This is the most popular measure of central
location
Sum of the observations
Mean =
Number of observations
4
The Arithmetic Mean
Sample mean
x
n
n
ii11xxii
nn
Sample size
Population mean

N
 i1 x i
N
Population size
5
The Arithmetic Mean
• Example
The reported time on the Internet of 10 adults are 0, 7, 12, 5, 33,
14, 8, 0, 9, 22 hours. Find the mean time on the Internet.
10
x01  x72  ...  x22
i 1 xi
10
x


10
10
11.0
6
The Arithmetic Mean
• Drawback of the mean:
It can be influenced by unusual
observations, because it uses all the
information in the data set.
7
The Median
• The Median of a set of observations is the value
that falls in the middle when the observations are
arranged in order of magnitude. It divides the
data in half.
Example
Comment
Find the median of the time on the internet Suppose only 9 adults were sampled
(exclude, say, the longest time (33))
for the 10 adults of previous example
Even number of observations
0, 0, 5,
0, 7,
5, 8,
7, 8,
9, 12,
14,14,
22,22,
33 33
8.59,, 12,
Odd number of observations
0, 0, 5, 7, 8 9, 12, 14, 22
8
The Median
• Depth of median = (n+1)/2
 X (( n1) / 2) if n is odd

Median   X ( k )  X ( k 1)
if n is even(n  2k )

2

9
The Mode
• The Mode of a set of observations is the value that
occurs most frequently.
• Set of data may have one mode (or modal class), or
two or more modes.
The modal class
10
The Mode
• Find the mode for the data in the Example. Here are
the data again: 0, 7, 12, 5, 33, 14, 8, 0, 9, 22
Solution
• All observation except “0” occur once. There are two “0”s. Thus,
the mode is zero.
• Is this a good measure of central location?
• The value “0” does not reside at the center of this set
(compare with the mean = 11.0 and the median = 8.5).
11
Relationship among Mean, Median, and Mode
• If a distribution is from a bell shaped symmetrical
one, the mean, median and mode coincide
Mean = Median = Mode
• If a distribution is asymmetrical, and skewed
to the left or to the right, the three measures
differ.
A positively skewed distribution
(“skewed to the right”)
Mode < Median < Mean
Mode Mean
Median
12
Relationship among Mean, Median, and
Mode
• If a distribution is non symmetrical, and
skewed to the left or to the right, the three
measures differ.
A positively skewed distribution
(“skewed to the right”)
A negatively skewed distribution
(“skewed to the left”)
Mode
Mean
Median
Mean
Mode
Median
Mean < Median < Mode
13
Measures of variability
• Measures of central location fail to tell the
whole story about the distribution.
• A question of interest still remains unanswered:
How much are the observations spread out
around the mean value?
14
Measures of variability
Observe two hypothetical
data sets:
Small variability
The average value provides
a good representation of the
observations in the data set.
This data set is now
changing to...
15
Measures of Variability
Observe two hypothetical
data sets:
Small variability
The average value provides
a good representation of the
observations in the data set.
Larger variability
The same average value does not
provide as good representation of the
observations in the data set as before.
16
The Range
– The range of a set of observations is the difference
between the largest and smallest observations.
– Its major advantage is the ease with which it can be
computed.
– Its major shortcoming is its failure to provide information
on the dispersion of the observations between the two
end points.
But, how do all the observations spread out?
The range cannot assist in answering this question
? Range
? ?
Smallest
observation
Largest
observation
17
The Variance


This measure reflects the dispersion of all the
observations
The variance of a population of size N x1, x2,…,xN
whose mean is  is defined as
2 

2
N
(
x


)
i
i1
N
The variance of a sample of n observations
x1, x2, …,xn whose mean is x is defined as
s2 
ni1( xi  x)2
n 1
18
Why not use the sum of deviations?
Consider two small populations:
9-10= -1
11-10= +1
8-10= -2
12-10= +2
A measure of dispersion
A
Can the sum of deviations
agreesofwith
this
Be aShould
good measure
dispersion?
The sum
of deviations is
observation.
zero for both populations,
8 9 10 11 12
therefore, is not a good
…but
Themeasurements
mean of both in B
measure
of
arepopulations
moredispersion.
dispersed
is 10...
4-10 = - 6
16-10 = +6
7-10 = -3
than those in A.
B
4
Sum = 0
7
10
13
16
13-10 = +3
Sum = 019
The Variance
Let us calculate the variance of the two populations
2
2
2
2
2
2 (8  10)  (9  10)  (10  10)  (11  10)  (12  10)
A 
2
5
2
2
2
2
2
2 ( 4  10)  (7  10)  (10  10)  (13  10)  (16  10)
B 
 18
5
Why is the variance defined as
the average squared deviation?
Why not use the sum of squared
deviations as a measure of
variation instead?
After all, the sum of squared
deviations increases in
magnitude when the variation
of a data set increases!!
20
The Variance
Let us calculate the sum
of squared
deviations
for both data sets
Which
data set has
a larger dispersion?
Data set B
is more dispersed
around the mean
A
B
1
2 3
1
3
5
21
The Variance
SumA = (1-2)2 +…+(1-2)2 +(3-2)2 +… +(3-2)2= 10
SumB = (1-3)2 + (5-3)2 = 8
SumA > SumB. This is inconsistent with the
observation that set B is more dispersed.
A
B
1
2 3
1
3
5
22
The Variance
However, when calculated on “per observation”
basis (variance), the data set dispersions are
properly ranked.
A2 = SumA/N = 10/5 = 2
B2 = SumB/N = 8/2 = 4
A
B
1
2 3
1
3
5
23
The Variance
• Example
– The following sample consists of the number of
jobs six students applied for: 17, 15, 23, 7, 9,
13. Find its mean and variance
• Solution
x
i61 xi
6
17  15  23  7  9  13 84


 14 jobs
6
6

n
2

(
x

x
)
1
2
2
2
2
i1 i
s 

(17  14)  (15  14)  ...(13  14)
n 1
6 1
 33.2 jobs 2

24
The Variance – Shortcut method
n
2
n


1
(

x
)
2
2
i1 i
s 
 x i 

n  1  i1
n

2



1  2
17

15

...

13
2
2

 17  15  ...  13 

6  1 
6



 33.2 jobs 2
25
Standard Deviation
• The standard deviation of a set of
observations is the square root of the
variance.
Sample standard deviation : s  s 2
Population standard deviation :   
2
26
Standard Deviation
• Example
– To examine the consistency of shots for a new
innovative golf club, a golfer was asked to hit 150
shots, 75 with a currently used (7-iron) club, and
75 with the new club.
– The distances were recorded.
– Which club is better?
27
Standard Deviation
• Example – solution
Excel printout, from the
“Descriptive Statistics” submenu.
The innovation club is
more consistent, and
because the means are
close, is considered a
better club
Current
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
Innovation
150.5467
0.668815
151
150
5.792104
33.54847
0.12674
-0.42989
28
134
162
11291
75
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
150.1467
0.357011
150
149
3.091808
9.559279
-0.88542
0.177338
12
144
156
11261
75
28
The Coefficient of Variation
• The coefficient of variation of a set of measurements
is the standard deviation divided by the mean value.
s
Sample coefficient of variation: cv 
x

P opulationcoefficient of variation: CV 

• This coefficient provides a proportionate measure of
variation.
A standard deviation of 10 may be perceived
large when the mean value is 100, but only
moderately large when the mean value is 500
29
Percentiles
•
Example from http://www.ehow.com/how_2310404_calculate-percentiles.html
• Your test score, e.g. 70%, tells you how many
questions you answered correctly. However, it
doesn’t tell how well you did compared to the other
people who took the same test.
• If the percentile of your score is 75, then you scored
higher than 75% of other people who took the test.
30
Sample Percentiles and Box Plots
• Percentile
– The pth percentile of a set of measurements is the
value for which
• p percent of the observations are less than that value
• 100(1-p) percent of all the observations are greater than
that value.
31
Sample Percentiles
•Find the 10 percentile of 6 8 3 6 2 8 1
•Order the data: 1 2 3 6
6 8 8
•7*(0.10) = 0.70; round up to 1
The first observation, 1, is the 10 percentile.
32
• Commonly used percentiles
– First (lower) quartile, Q1 = 25th percentile
– Second (middle) quartile,Q2 = 50th percentile
– Third quartile, Q3 = 75th percentile
– Fourth quartile, Q4 = 100th percentile
– First (lower) decile = 10th percentile
– Ninth (upper) decile = 90th percentile
33
Quartiles and Variability
• Quartiles can provide an idea about the shape
of a histogram
Q1 Q2
Positively skewed
histogram
Q3
Q1
Q2
Q3
Negatively skewed
histogram
34
Interquartile Range
• Large value indicates a large spread of the
observations
Interquartile range = Q3 – Q1
35
Paired Data Sets and the Sample
Correlation Coefficient
• The covariance and the coefficient of
correlation are used to measure the direction
and strength of the linear relationship
between two variables.
– Covariance - is there any pattern to the way two
variables move together?
– Coefficient of correlation - how strong is the linear
relationship between two variables
36
Covariance
Populationcovariance  COV(X,Y) 
(x i   x )( y i   y )
N
x (y) is the population mean of the variable X (Y).
N is the population size.
(xi  x)(y i  y )
Sample covariance  cov(x, y) 
n-1
x (y) is the sample mean of the variable X (Y).
n is the sample size.
37
Covariance
• If the two variables move in the same
direction, (both increase or both decrease),
the covariance is a large positive number.
• If the two variables move in opposite
directions, (one increases when the other
one decreases), the covariance is a large
negative number.
• If the two variables are unrelated, the
covariance will be close to zero.
38
Covariance
• Compare the following three sets
xi
yi
(x – x)
(y – y)
(x – x)(y – y)
2
6
7
13
20
27
-3
1
2
-7
0
7
21
0
14
x=5
y =20
Cov(x,y)=17.5
xi
yi
(x – x)
(y – y)
(x – x)(y – y)
2
6
7
27
20
13
-3
1
2
7
0
-7
-21
0
-14
x=5
y =20
Cov(x,y)=-17.5
xi
yi
2
6
7
20
27
13
Cov(x,y) = -3.5
x=5 y =20
39
The coefficient of correlation
Population coefficient of correlation
COV( X, Y)

x y
Sample coefficient of correlation
cov(X, Y)
r
sx sy
– This coefficient answers the question: How
strong is the association between X and Y?
40
The coefficient of correlation
+1 Strong positive linear relationship
COV(X,Y)>0
 or r =
or
0
No linear relationship
-1 Strong negative linear relationship
COV(X,Y)=0
COV(X,Y)<0
41
The Coefficient of Correlation
• If the two variables are very strongly positively
related, the coefficient value is close to +1
(strong positive linear relationship).
• If the two variables are very strongly negatively
related, the coefficient value is close to -1
(strong negative linear relationship).
• No straight line relationship is indicated by a
coefficient close to zero.
42
The Coefficient of Correlation
43
Correlation and causation
• Recognize the difference between correlation and
causation — just because two things occur together,
that does not necessarily mean that one causes the
other.
• For random processes, causation means that if A
occurs, that causes a change in the probability that B
occurs.
44
Correlation and causation
• Existence of a statistical relationship, no matter how strong it
is, does not imply a cause-and-effect relationship between X
and Y. for ex, let X be size of vocabulary, and Y be writing
speed for a group of children. There most probably be a
positive relationship but this does not imply that an increase
in vocabulary causes an increase in the speed of writing.
Other variables such as age, education etc will affect both X
and Y.
• Even if there is a causal relationship between X and Y, it might
be in the opposite direction, i.e. from Y to X. For eg, let X be
thermometer reading and let Y be actual temperature. Here Y
will affect X.
45
Example
Dr. Leonard Eron, professor at the University of Illinois at Chicago, has
conducted a longitudinal study of the long–term effects of violent
television programming. In 1960, he asked 870 third grade children
their favorite television shows. He found that children judged most
violent by their peers also watched the most violent television. Dr.
Eron noted, however, that it was not clear which came first — the
child’s behavior or the influence of television.
In follow-up interviews at ten–year intervals, Eron found that
youngsters who at age eight were nonaggressive but were watching
violent television were more aggressive than children who at age
eight were aggressive and watched non–violent television. Eron
claims that this establishes a cause–and–effect relationship
between watching violent television and aggressive behavior.
Can you think of any other possible causes?
46
Example - solution
• It could be that the difference in aggressive
behavior is due to other familial influences.
Perhaps children who are permitted to watch
violent programming are more likely to come
from violent or abusive families, which could
also lead to more aggressive behavior.
47