Continous Probability Distributions

Download Report

Transcript Continous Probability Distributions

Sampling Distributions
Martina Litschmannová
[email protected]
K210
Populations vs. Sample
 A population includes each element from the set of
observations that can be made.
 A sample consists only of observations drawn from the
population.
Exploratory
Data Analysis
sampling
sample
population
Inferential
Statistics
Characteristic of a population
vs.
characteristic of a sample
 A a measurable characteristic of a population, such as
a mean or standard deviation, is called a parameter, but a
measurable characteristic of a sample is called a statistic.
Population
Sample
Expectation
(mean)
𝐸 𝑋 , resp. πœ‡
Sample mean
(average)
𝑋
Median
x0,5
Variance
(dispersion)
𝐷 𝑋 , resp. 𝜎 2
Std. deviation
Οƒ
Probability
Ο€
Sample
median
𝑋0,5
Sample variance
S2
Sample std.
deviation
S
Relative frequency
p
Sampling Distributions
 A sampling distribution is created by, as the name suggests,
sampling.
 The method we will employ on the rules of probability and
the laws of expected value and variance to derive the
sampling distribution.
For example, consider the roll of one and two dices…
The roll of one die
 A fair die is thrown infinitely many times, with the random
variable X = # of spots on any throw.
 The probability distribution of X is:
π‘₯𝑖
𝑃 π‘₯𝑖
1
2
3
4
5
6
1/6
1/6
1/6
1/6
1/6
1/6
 The mean, variance and standard deviation are calculated as:
𝐸 𝑋 =
𝐷 𝑋 =
6
2 =
𝑖=1 π‘₯𝑖 𝑃 π‘₯𝑖 = πŸ‘, πŸ“, 𝐸 𝑋
2
2
𝐸 𝑋 βˆ’ 𝐸 𝑋
= 𝟐, πŸ—πŸ, 𝜎
6
2
𝑖=1 π‘₯𝑖 𝑃
𝑋 =
π‘₯𝑖 = 15,17
𝐷 𝑋 = 𝟏, πŸ•πŸ
The roll of Two Dices
The Sampling Distribution of Mean
 A sampling distribution is created by looking at all samples of
size n=2 (i.e. two dice) and their means.
Sample
{1, 1}
{1, 2}
{1, 3}
{1, 4}
{1, 5}
{1, 6}
{2, 1}
{2, 2}
{2, 3}
{2, 4}
{2, 5}
{2, 6}
Mean
1,0
1,5
2,0
2,5
3,0
3,5
1,5
2,0
2,5
3,0
3,5
4,0
Sample
{3, 1}
{3, 2}
{3, 3}
{3, 4}
{3, 5}
{3, 6}
{4, 1}
{4, 2}
{4, 3}
{4, 4}
{4, 5}
{4, 6}
Mean
2,0
2,5
3,0
3,5
4,0
4,5
2,5
3,0
3,5
4,0
4,5
5,0
Sample
{3, 1}
{3, 2}
{3, 3}
{3, 4}
{3, 5}
{3, 6}
{4, 1}
{4, 2}
{4, 3}
{4, 4}
{4, 5}
{4, 6}
Mean
3,0
3,5
4,0
4,5
5,0
5,5
3,5
4,0
4,5
5,0
5,5
6,0
The roll of Two Dices
The Sampling Distribution of Mean
 A sampling distribution is created by looking at all samples of
size n=2 (i.e. two dice) and their means.
𝑃(π‘₯)
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
6/36
5/36
Probability
π‘₯
1,0
1,5
2,0
2,5
3,0
3,5
4,0
4,5
5,0
5,5
6,0
4/36
3/36
2/36
1/36
0
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
Mean
 While there are 36 possible samples of size 2, there are only
11 values for 𝑋, and some (e.g. π‘₯ = 3,5) occur more
frequently than others (e.g. π‘₯ = 1).
The roll of Two Dices
The Sampling Distribution of Mean
 A sampling distribution is created by looking at all samples of
size n=2 (i.e. two dice) and their means.
𝑃(π‘₯)
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
𝐸 𝑋 =
6/36
5/36
Probability
π‘₯
1,0
1,5
2,0
2,5
3,0
3,5
4,0
4,5
5,0
5,5
6,0
4/36
3/36
2/36
1/36
0
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
Mean
11
𝑖=1 π‘₯𝑖 𝑃
π‘₯𝑖 = πŸ‘, πŸ“, 𝐸 𝑋 2 =
𝐷 𝑋 = 𝐸 𝑋2 βˆ’ 𝐸 𝑋
2
2
11
𝑖=1 π‘₯𝑖 𝑃
= 𝟏, πŸ’πŸ”, 𝜎 𝑋 =
π‘₯𝑖 = 13,71
𝐷 𝑋 = 𝟏, 𝟐𝟏
The roll of Two Dices
The Sampling Distribution of Mean
 A sampling distribution is created by looking at all samples of
size n=2 (i.e. two dice) and their means.
𝑃(π‘₯)
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
6/36
5/36
Probability
π‘₯
1,0
1,5
2,0
2,5
3,0
3,5
4,0
4,5
5,0
5,5
6,0
4/36
3/36
2/36
1/36
0
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
Mean
 𝑋𝑖 = # of spots on i-th dice, 𝐸 𝑋1 = 𝐸 𝑋2 = πœ‡, 𝐷 𝑋1 = 𝐷 𝑋2 = 𝜎 2
1
2
 𝐸 𝑋 = 𝐸(
𝜎 𝑋 =
2
𝑖=1 𝑋𝑖 )
𝐷 𝑋 =
1
1
2
= πœ‡ = πŸ‘, πŸ“, 𝐷 𝑋 = 𝐷(
𝜎 = 𝟏, 𝟐𝟏
2
𝑖=1 𝑋𝑖 )
1
2
= 𝜎 2 = 𝟏, πŸ’πŸ”,
Compare
 𝑋𝑖 = # of spots on i-th dice, 𝑖 = 1,2, 𝐸 𝑋𝑖 = πœ‡, 𝐷 𝑋𝑖 = 𝜎 2
P(x)
6/36
1/6
Probability
5/36
4/36
3/36
2/36
1/36
0
0
1
2
3
4
5
6
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
x
Distribution of X
Mean
Sampling Distribution of 𝑿
Note that: 𝐸 𝑋 = πœ‡, 𝐷 𝑋 =
𝜎2
2
Generalize - Central Limit Theorem
 The sampling distribution of the mean of a random sample
drawn from any population is approximately normal for a
sufficiently large sample size.
𝐸 𝑋 = πœ‡, 𝐷 𝑋 =
𝜎2
,
𝑛
𝜎 𝑋 =
𝜎
𝑛
 The larger the sample size, the more closely the sampling
distribution of X will resemble a normal distribution.
f(x) 1,2
n=1
n=5
n=10
n=30
1
0,8
0,6
0,4
0,2
0
0
20
x
Central Limit Theorem
 𝑋𝑖 … random variable, 𝑖 = 1, … , 𝑛, 𝑛 β†’ ∞ 𝐸 𝑋𝑖 = πœ‡, 𝐷 𝑋𝑖 = 𝜎 2
𝜎
𝑛
πœ‡
Same Distribution of all 𝑋𝑖
 Note that: 𝐸 𝑋 = πœ‡, 𝐷 𝑋 =
Sampling Distribution of 𝑿
𝜎2
,
𝑛
𝜎 𝑋 =
𝜎
𝑛
… standard error
Generalize - Central Limit Theorem
 The sampling distribution of 𝑋 drawn from any population is
approximately normal for a sufficiently large sample size.
𝜎2
π‘‹βˆ’πœ‡
𝑋 β†’ 𝑁 πœ‡,
β‡’
𝑛 β†’ 𝑁 0,1
𝑛
𝜎
 In many practical situations, a sample size of 30 may be
sufficiently large to allow us to use the normal distribution as
an approximation for the sampling distribution of 𝑋.
 Note: If X is normal, 𝑋 is normal. We don’t need Central Limit
Theorem in this case.
1. The foreman of a bottling plant has observed that the
amount of soda in each β€œ32-ounce” bottle is actually a
normally distributed random variable, with a mean of 32,2
ounces and a standard deviation of 0,3 ounce.
A) If a customer buys one bottle, what is the probability that
the bottle will contain more than 32 ounces?
1. The foreman of a bottling plant has observed that the
amount of soda in each β€œ32-ounce” bottle is actually a
normally distributed random variable, with a mean of 32,2
ounces and a standard deviation of 0,3 ounce.
B) If a customer buys a carton of four bottles, what is the
probability that the mean amount of the four bottles will be
greater than 32 ounces?
Graphically Speaking
What is the probability that
one bottle will contain more
than 32 ounces?
𝑋 β†’ 𝑁 32,2; 0,09
What is the probability that the
mean of four bottles will
exceed 32 oz?
0,09
𝑋 β†’ 𝑁 32,2;
4
2. The probability distribution of 6-month incomes of account
executives has mean $20,000 and standard deviation $5,000.
A) A single executive’s income is $20,000. Can it be said that
this executive’s income exceeds 50% of all account executive
incomes?
Answer: No information given about shape of distribution of X;
we do not know the median of 6-month incomes.
2. The probability distribution of 6-month incomes of account
executives has mean $20,000 and standard deviation $5,000.
B) n=64 account executives are randomly selected. What is
the probability that the sample mean exceeds $20,500?
3. A sample of size n=16 is drawn from a normally distributed
population with 𝐸 𝑋𝑖 = 20 and 𝜎 𝑋𝑖 = 8. Find 𝑃(16 <
4. Battery life 𝑋 β†’ 𝑁(20, 10). Guarantee: avg. battery life in a
case of 24 exceeds 16 hrs. Find the probability that a
randomly selected case meets the guarantee.
5. Cans of salmon are supposed to have a net weight of 6 oz. The
canner says that the net weight is a random variable with mean
=6,05 oz. and stand. dev. =0,18 oz. Suppose you take a random
sample of 36 cans and calculate the sample mean weight to be
5.97 oz. Find the probability that the mean weight of the sample is
less than or equal to 5.97 oz.
Since 𝑃 𝑋 < 5,97 = 0,0038, either you observed a β€œrare”
event (recall: 5,97 oz is 2,67 stand. dev. below the mean) and
the mean fill 𝐸(𝑋) is in fact 6,05 oz. (the value claimed by the
canner), the true mean fill is less than 6,05 oz., (the canner is
lying ).
Sampling Distribution of a Proportion
 The estimator of a population proportion πœ‹ of successes is the
sample proportion 𝒑. That is, we count the number of
successes in a sample and compute:
𝑋
𝑛
πœ‹=𝑝= .
 X is the number of successes, n is the sample size.
Normal Approximation to Binomial
 Binomial distribution with n=20 and πœ‹ = 0,5 with a normal
approximation superimposed ( πœ‡ = π‘›πœ‹ = 10 and 𝜎 2 =
πœ‹ 1βˆ’πœ‹
𝑛
= 0,0125).
Normal Approximation to Binomial
 Normal approximation to the binomial works best when the
number of experiments n (sample size) is large, and the
probability of success πœ‹ is close to 0,5.
 For the approximation to provide good results one condition
should be met:
𝑛>
9
𝑝 1βˆ’π‘
.
Sampling Distribution of a Sample Proportion
 Using the laws of expected value and variance, we can
determine the mean, variance, and standard deviation of 𝑝.
𝐸 𝑝 = πœ‹, 𝐷 𝑝 =
πœ‹ 1βˆ’πœ‹
𝑛
,𝜎 𝑝 =
πœ‹ 1βˆ’πœ‹
𝑝 β†’ 𝑁 πœ‹,
𝑛
πœ‹ 1βˆ’πœ‹
𝑛
standard error of
the proportion
 Sample proportions can be standardized to a standard normal
π‘βˆ’πœ‹
distribution using this formulation:
𝑛 β†’ 𝑁 0,1 .
πœ‹ 1βˆ’πœ‹
6. Find the probability that of the next 120 births, no more than
40% will be boys. Assume equal probabilities for the births of
boys and girls.
7. 12% of students at NCSU are left-handed. What is the
probability that in a sample of 50 students, the sample
proportion that are left-handed is less than 11%?
Sampling Distribution: Difference of two means
Assumption:
Independent random samples be drawn from each of two
normal populations.
 If this condition is met, then the sampling distribution of the
difference between the two sample means will be normally
distributed if the populations are both normal.
 Note: If the two populations are not both normally
distributed, but the sample sizes are β€œlarge” (>30), the
distribution of 𝑋1 βˆ’ 𝑋2 is approximately normal – Central
Limit Theorem.
Sampling Distribution: Difference of two means
2
𝜎1
𝑋1 β†’ 𝑁 πœ‡1 ,
𝑛1
2
𝜎2
𝑋2 β†’ 𝑁 πœ‡2 ,
𝑛2
,
2
β‡’ 𝐸 𝑋1 βˆ’ 𝑋2 = πœ‡1 βˆ’ πœ‡2 ,
𝐷 𝑋1 βˆ’ 𝑋2 =
2
𝑋1 βˆ’ 𝑋2 β†’ 𝑁 πœ‡1 βˆ’
𝜎
πœ‡2 , 1
𝑛1
𝑋1 βˆ’ 𝑋2 βˆ’ πœ‡1 βˆ’ πœ‡2
2
2
𝜎1
𝜎
+ 2
𝑛1
𝑛2
𝜎1
𝑛1
2
+
𝜎2
𝑛2
2
+
𝜎2
𝑛2
β†’ 𝑁 0,1
standard error of the
difference between two
means
Sampling Distribution: Difference of two proportions
Assumption:
Central Limit Theorem: 𝑛1 >
9
𝑝1 1βˆ’π‘1
, 𝑛2 >
9
𝑝2 1βˆ’π‘2
Sampling Distribution: Difference of two means
πœ‹1 1 βˆ’ πœ‹1
𝑝1 β†’ 𝑁 πœ‹1 ,
𝑛1
β‡’ 𝐸 𝑝1 βˆ’ 𝑝2 = πœ‹1 βˆ’ πœ‹2 ,
𝑝1 βˆ’ 𝑝2 β†’ 𝑁
,
πœ‹2 1 βˆ’ πœ‹2
𝑝2 β†’ 𝑁 πœ‹2 ,
𝑛2
𝐷 𝑋1 βˆ’ 𝑋2 =
πœ‹1 1βˆ’πœ‹1
πœ‹1 βˆ’ πœ‹2 ,
𝑛1
𝑝1 βˆ’ 𝑝2 βˆ’ πœ‹1 βˆ’ πœ‹2
πœ‹1 1 βˆ’ πœ‹1
πœ‹ 1 βˆ’ πœ‹2
+ 2
𝑛1
𝑛2
πœ‹1 1βˆ’πœ‹1
𝑛1
+
πœ‹2 1βˆ’πœ‹2
𝑛2
πœ‹2 1βˆ’πœ‹2
+
𝑛2
β†’ 𝑁 0,1
standard error of the
difference between two
proportions
Special Continous Distribution
πœ’ 2 Distribution
βˆ€π‘– = 1, … , 𝑛: 𝑍𝑖 β†’ 𝑁 0; 1 , pak 𝑋 =
𝜈
2
𝑖=1 𝑍𝑖
β†’ πœ’πœˆ2
Degrees of Freedom
Using of πœ’ 2 Distribution
𝑛 βˆ’ 1 𝑆2
2
β†’
πœ’
π‘›βˆ’1
𝜎2
8. The Acme Battery Company has developed a new cell phone
battery. On average, the battery lasts 60 minutes on a single
charge. The standard deviation is 4 minutes.
Suppose the manufacturing department runs a quality
control test. They randomly select 7 batteries. What is
probability, that the standard deviation of the selected
batteries is greather than 6 minutes?
Student's t Distribution
𝑍 β†’ 𝑁 0,1 , 𝑉 β†’ πœ’πœˆ2 , 𝑍 and 𝑉 are independent variables β‡’
If 𝑇 =
𝑍
,
𝑉
𝜈
then 𝑇 has Studentβ€˜s t Distribution with 𝜈 degrees of freedom,
𝑇 β†’ π‘‘πœˆ .
Using of Studentβ€˜s tDistribution
π‘‹βˆ’πœ‡
𝑛 β†’ π‘‘π‘›βˆ’1
𝑆
The t distribution should be used with small samples from populations
that are not approximately normal.
9. Acme Corporation manufactures light bulbs. The CEO claims
that an average Acme light bulb lasts 300 days. A researcher
randomly selects 15 bulbs for testing. The sampled bulbs last
an average of 290 days, with a standard deviation of 50 days.
If the CEO's claim were true, what is the probability that 15
randomly selected bulbs would have an average life of no
more than 290 days?
F Distribution
The f Statistic
 The f statistic, also known as an f value, is a random
variable that has an F distribution.
Here are the steps required to compute an f statistic:
 Select a random sample of size n1 from a normal population,
having a standard deviation equal to Οƒ1.
 Select an independent random sample of size n2 from a
normal population, having a standard deviation equal to Οƒ2.
 The f statistic is the ratio of s12/Οƒ12 and s22/Οƒ22.
F Distribution
Here are the steps required to compute an f statistic:
 Select a random sample of size n1 from a normal population,
having a standard deviation equal to Οƒ1.
 Select an independent random sample of size n2 from a
normal population, having a standard deviation equal to Οƒ2.
 The f statistic is the ratio of s12/Οƒ12 and s22/Οƒ22.
𝑓 = 𝑆12 𝜎12
𝑆22 𝜎22 β†’ 𝐹𝑛1 βˆ’1,𝑛2 βˆ’1.
Degrees of freedom
10. Suppose you randomly select 7 women from a population of
women, and 12 men from a population of men. The table
below shows the standard deviation in each sample and in
each population.
Population
Population standard
deviation
Sample standard
deviation
Women
Men
30
50
35
45
Find probability, that sample standard deviation of men is
greather than twice sample standard deviation of women.
Study materials :
 http://homel.vsb.cz/~bri10/Teaching/Bris%20Prob%20&%20Stat.pdf
(p. 93 - p.104)
 http://stattrek.com/tutorials/statistics-tutorial.aspx?Tutorial=Stat
(Distributions – Continous (Students, πœ’ 2 , F Distribution) + Estimation