Lecture 8 - Statistics
Download
Report
Transcript Lecture 8 - Statistics
Statistics 400 - Lecture 8
Completed so far (any material discussed in these sections is fair game):
2.1-2.5
4.1-4.5
5.1-5.8 (READ 5.7)
6.1-6.4; 6.6
7.1-7.2
Today: finish 7.3, 8.1-8.3
READ 7.4!!!
Assignment #3: 6.2, 6.6, 6.34, 6.78 (interpret the plot in terms of Normality), 7.20,
7.28, 8.14, 8.22, 8.36
Due: Tuesday, Oct 16
Central Limit Theorem
In a random sample (iid sample) from any population with mean
and standard deviation when n is large, the distribution of the
sample mean
is approximately normal.
x
That is,
Thus,
x
Z
/ n
Implications
So, for random samples, if have enough data, sample mean is
approximately normally distributed...even if data not normally
distributed
If have enough data, can use the normal distribution to make
probability statements about x
Example
A busy intersection has an average of 2.2 accidents per week with a
standard deviation of 1.4 accidents
Suppose you monitor this intersection of a given year, recording the
number of accidents per week.
Data takes on integers (0,1,2,...) thus distribution of number of
accidents not normal.
What is the distribution of the mean number of accidents per week
based on a sample of 52 weeks of data
Example
What is the approximate probability that
x
is less than 2
What is the approximate probability that there are less than 100
accidents in a given year?
Statistical Inference (Chapter 8)
Would like to make inferences about a population based on samples
The fatality rate for a disease is 50%. In controlled study, 100
patients with a disease are given a new drug. Would you conclude
that the drug is successful if:
100% of the patients survived
75% of the patients survived
55% of the patients survived
52% of the patients survived
Statistical inference deals with drawing conclusions about
population parameters from the analysis of sample data
Estimation of parameters
Estimate a single value for a parameter (point estimation)
Estimate a plausible range of values for a parameter (interval
estimation)
Testing of hypothesis
Procedure for testing whether data supports a hypothesis or theory
Point Estimation
Objective: to estimate a population parameter based on sample
data
Point estimator is a statistic that estimates a population parameter
Standard deviation of the statistic is called the standard error
(most of the time)
Example
Sample mean:
How do you estimate the standard error?
If have a random sample of size n from a normal population, what
is the distribution of the sample mean?
If the sampling procedure is done repeatedly, what proportion of
sample means lie in the interval 2 , 2 ?
If the sampling procedure is done repeatedly, what proportion of
sample means lie in the interval 3 , 3 ?
When estimating with
, the 100(1- )% margin of error, d,
is the value where 100(1- )% of the sample means will fall in the
interval d , d
x
For large samples, d z / 2
n
Sample Size Calculation
Before collecting data, should have some desired margin of error, d
and an associated probability
Based on this can determine appropriate sample size
d z / 2
n
What does this sample size guarantee?
Example (8.12)
Standard deviation of heights of 5 year-old boys is 3.5 inches
How many boys must be sampled if we want to be 90% certain
that the population mean height is within 0.5 inches?
Confidence Intervals for the Mean
Last day, introduced a point estimator…a statistic that estimates a
population parameter
Often more desirable to present a plausible range for the
parameter, based on the data
We will call this a confidence interval
Ideally, the interval contains the true parameter value
In practice, not possible to guarantee because of sample to sample
variation
Instead, we compute the interval so that before sampling, the
interval will contain the true value with high probability
This high probability is called the confidence level of the interval
Confidence Interval for for a Normal
Population
Situation:
Have a random sample of size n from N ( , )
Suppose value of the standard deviation is known
Value of population mean is unknown
Last day we saw that 100(1 )% of sample means will fall in the
interval:
z
,
z
/2
/2
n
n
Therefore, before sampling the probability of getting a sample
mean in this interval is (1 )
Equivalently,
P z / 2
X z / 2 (1 )
n
n
Equivalently,
P X z / 2
X z / 2 (1 )
n
n
The interval below is called a 100(1 )% confidence interval for
X
z
,
X
z
/2
/2
n
n
Example
To assess the accuracy of a laboratory scale, a standard weight
known to be 10 grams is weighed 5 times
The reading are normally distributed with unknown mean and a
standard deviation of 0.0002 grams
Mean result is 10.0023 grams
Find a 90% confidence interval for the mean