Lecture 8 - Statistics

Download Report

Transcript Lecture 8 - Statistics

Statistics 400 - Lecture 8
 Completed so far (any material discussed in these sections is fair game):
 2.1-2.5
 4.1-4.5
 5.1-5.8 (READ 5.7)
 6.1-6.4; 6.6
 7.1-7.2
 Today: finish 7.3, 8.1-8.3
 READ 7.4!!!
 Assignment #3: 6.2, 6.6, 6.34, 6.78 (interpret the plot in terms of Normality), 7.20,
7.28, 8.14, 8.22, 8.36
 Due: Tuesday, Oct 16
Central Limit Theorem
 In a random sample (iid sample) from any population with mean 
and standard deviation  when n is large, the distribution of the
sample mean
is approximately normal.
x
 That is,
 Thus,
x
Z
/ n
Implications
 So, for random samples, if have enough data, sample mean is
approximately normally distributed...even if data not normally
distributed
 If have enough data, can use the normal distribution to make
probability statements about x
Example
 A busy intersection has an average of 2.2 accidents per week with a
standard deviation of 1.4 accidents
 Suppose you monitor this intersection of a given year, recording the
number of accidents per week.
 Data takes on integers (0,1,2,...) thus distribution of number of
accidents not normal.
 What is the distribution of the mean number of accidents per week
based on a sample of 52 weeks of data
Example
 What is the approximate probability that
x
is less than 2
 What is the approximate probability that there are less than 100
accidents in a given year?
Statistical Inference (Chapter 8)
 Would like to make inferences about a population based on samples
 The fatality rate for a disease is 50%. In controlled study, 100
patients with a disease are given a new drug. Would you conclude
that the drug is successful if:
 100% of the patients survived
 75% of the patients survived
 55% of the patients survived
 52% of the patients survived
 Statistical inference deals with drawing conclusions about
population parameters from the analysis of sample data
 Estimation of parameters
 Estimate a single value for a parameter (point estimation)
 Estimate a plausible range of values for a parameter (interval
estimation)
 Testing of hypothesis
 Procedure for testing whether data supports a hypothesis or theory
Point Estimation
 Objective: to estimate a population parameter based on sample
data
 Point estimator is a statistic that estimates a population parameter
 Standard deviation of the statistic is called the standard error
(most of the time)
Example
 Sample mean:
 How do you estimate the standard error?
 If have a random sample of size n from a normal population, what
is the distribution of the sample mean?
 If the sampling procedure is done repeatedly, what proportion of
sample means lie in the interval   2 ,   2  ?
 If the sampling procedure is done repeatedly, what proportion of
sample means lie in the interval   3 ,   3  ?
 When estimating  with
, the 100(1- )% margin of error, d,
is the value where 100(1- )% of the sample means will fall in the
interval   d ,   d 
x
 For large samples, d  z / 2

n
Sample Size Calculation
 Before collecting data, should have some desired margin of error, d
and an associated probability
 Based on this can determine appropriate sample size
 d  z / 2

n
 What does this sample size guarantee?
Example (8.12)
 Standard deviation of heights of 5 year-old boys is 3.5 inches
 How many boys must be sampled if we want to be 90% certain
that the population mean height is within 0.5 inches?
Confidence Intervals for the Mean
 Last day, introduced a point estimator…a statistic that estimates a
population parameter
 Often more desirable to present a plausible range for the
parameter, based on the data
 We will call this a confidence interval
 Ideally, the interval contains the true parameter value
 In practice, not possible to guarantee because of sample to sample
variation
 Instead, we compute the interval so that before sampling, the
interval will contain the true value with high probability
 This high probability is called the confidence level of the interval
Confidence Interval for  for a Normal
Population
 Situation:
 Have a random sample of size n from N (  ,  )
 Suppose value of the standard deviation is known
 Value of population mean is unknown
 Last day we saw that 100(1   )% of sample means will fall in the
interval:

 



z
,


z
 /2
 /2

n
n 
 Therefore, before sampling the probability of getting a sample
mean in this interval is (1   )
 Equivalently,

 

P   z / 2
 X    z / 2   (1   )
n
n

 Equivalently,

 

P X  z / 2
   X  z / 2   (1   )
n
n

 The interval below is called a 100(1   )% confidence interval for

 

X

z
,
X

z
 /2
 /2

n
n 

Example
 To assess the accuracy of a laboratory scale, a standard weight
known to be 10 grams is weighed 5 times
 The reading are normally distributed with unknown mean and a
standard deviation of 0.0002 grams
 Mean result is 10.0023 grams
 Find a 90% confidence interval for the mean