Probability Essentials Chapter 3

Download Report

Transcript Probability Essentials Chapter 3

Random Sampling and Sampling Distributions
Chapter 6
“He stuck in his thumb,
Pulled out a plum
and said
‘what a good boy am I!’”
old nursery rhyme
MGMT 242
Topics and Goals for Chapter 6
• Random Sampling
• Sample Statistics and Relation to Population
Parameters
• Sampling Distribution for Sample Mean-”The Central Limit Theorem”
• Checking Normality-The Normal Probability Plot
–
–
–
–
samples from normal distributions
positively skewed distributions
negatively skewed distributions
distributions with outliers
MGMT 242
Populations and Samples
• A population is a large collection (theoretically,
for the mathematician, infinite) of the individuals
or items of interest (e.g. consuming public,
machine line production items, etc.)
• To measure characteristics of the population we
have to take a sample (smaller number).
• If we take a random sample, it is equally likely
that any member of the population will be
included in the sample.
MGMT 242
Random Sampling
• Sample represents population only if each member of
population equally likely to be included in sample.
• Types of random sampling (see also Chapter 16):
– Simple Random Sampling (SRS)-sample whole population
– Stratified Random Sampling
divide population into groups and sample from each group; for
example, in polls, divided country into four geographical regions
and sample from each
– Cluster Sampling
Divide population into groups and take a sample of a few groups
from the total--e.g., looking at hospital performance, sample
patients in few hospitals randomly chosen from all hospitals in the
state.
MGMT 242
Sample Statistics
• Sample Mean: xbar = (1/N)  xi , where “xbar” is x with
a bar over it; the sum is taken over all values of the
random variable X measured in the sample of N units.
xbar is an estimator of the population mean, .
• Sample Standard Deviation:
s = {[1/ (N-1)]  (xi- xbar)2 }(1/2
s is an “unbiased estimate” of the population standard
deviation, .
Note that for large samples (large N), N-1 N
MGMT 242
Sampling Distribution for Sample Means:
The Central Limit Theorem--1
• In general (which means almost always), no matter
what distribution the population follows, the
distribution of the sample means follows a normal
distribution with
• mean µsample means (for the population of sample
means) equal to µ, the mean for the parent
population, and
• standard deviation of the means
sample means=  /N. This means that the larger
the sample size, the more accurately we estimate
the mean.
MGMT 242
Sampling Distribution for Sample Means:
The Central Limit Theorem-2
• The histogram on the
left is for a sample
from a uniform
distribution (0 to
100). The sample
mean is 50.2 and the
sample standard
deviation is 29.3
(100/12)
35.000
30.000
20.000
15.000
10.000
5.000
96.57
83.25
69.93
56.61
43.29
29.97
16.65
0.000
3.34
Frequency
25.000
MGMT 242
Sampling Distribution for Sample Means:
The Central Limit Theorem-2
35.000
25.000
20.000
15.000
10.000
5.000
96.57
83.25
69.93
56.61
43.29
29.97
16.65
0.000
3.34
Fre que ncy
30.000
• The histogram on the
left is for the means of
150 samples, each size 9
(N = 9). The average of
these 150 means is 49.4
and the standard
deviation of these 150
sample means is 9.8
which is about
(100/[129]), the
population standard deviation of the mean.
MGMT 242
Normal Probability Plots (“P-plots”)
• The procedure to get this plot, which tests whether data
follow a normal distribution procedure, is the following:
– 1) order the N data;
– 2) assign a rank from 1--the lowest--to N--the highest value;
– 3) find the centile score of the mth data point from the
relation centile score = m/(N+1)--e.g the 1st data point out of
100 has a fraction approximately 1/101 lower; the 100th data
point has a fraction 100/101 lower;
– 4) find the z-value (standard normal variate) corresponding to
the centile score (this would be the z-score or N-score).
– 5) plot the observed points versus the z-score;
• If the points fall approximately on a straight line, the
distribution is a normal distribution.
MGMT 242
Normal Probability Plots (“P-plots”)
Examples
Exam 2 scores were
negatively skewed (range:
49-100, Q1=90,
median=92, Q3= 94
rank ordered value z-score
Exam 2
1
0.02
2
0.04
3
0.05
4
0.07
5
0.09
6
0.11
7
0.13
8
0.14
9
0.16
etc. ….
49
72
78
79
81
85
86
87
89
-2.10
-1.80
-1.61
-1.47
-1.35
-1.24
-1.15
-1.07
-0.99
Exam 2
120
100
80
60
40
20
0
-3.00 -2.00 -1.00
MGMT 242
0.00
1.00
2.00
3.00
Normal Probability Plots (“P-plots”)
Examples (cont.)
Normal Probability Plot
3.00
2.00
1.00
Nscore
This Pplot for Exam 2 scores is
from the “Statplus” addin;
note that the axes are interchanged from the previous
(conventional) order:
Nscore is y-axis, actual
score is x-axis
0.00
-2.00
rank ordered value z-score
Exam 2
-3.00
1
0.02
2
0.04
3
0.05
4
0.07
5
0.09
etc. ….
49
72
78
79
81
-4.00
-2.10
-1.80
-1.61
-1.47
-1.35
7
-1.00
3
4
8
333
8
5
-5.00
49
59
69
79
Exam 2
MGMT 242
89
99
Qualitative Appearance of P-plots
Distribution Type
Conventional
Appearance
StatPlus
Appearance
Normal Distribution
Approximately
straight line
Approximately
straight line
Positively Skewed
(Like Ex. 5.7a, lab)
Bends up at high end
Bends down at high
end
Negatively Skewed
(Like Exam 2 scores)
Bends down at high
end
Bends up at high end
Outliers
Bends up at low end,
(Like rectangular dist.) bends down at high
end (S-shaped)
MGMT 242
Bends down at low
end, bends up at high
end