#### Transcript Lecture5

Sampling and estimation Petter Mostad 2005.09.26 The normal distribution • The most used continuous probability distribution: – Many observations tend to approximately follow this distribution – It is easy and nice to do computations with – BUT: Using it can result in wrong conclusions when it is not appropriate 0.4 0.3 0.2 0.1 μ+2σ 0.0 μ-2σ -4 -2 0 2 The mean μ 4 The normal distribution • The probability density function is f ( x) • • • • 1 2 2 e ( x )2 / 2 2 where E ( X ) Var ( X ) 2 Notation N ( , 2 ) Standard normal distribution N (0,1) Using the normal density is often OK unless the actual distribution is very skewed Normal probability plots Normal Q-Q Plot of Household income in thousands 4 3 2 Expected Normal • Plotting the quantiles of the data versus the quantiles of the distribution. • If the data is approximately normally distributed, the plot will approximately show a straight line 1 0 -1 -2 -3 -200 0 200 400 600 Observed Value 800 1 000 1 200 The Normal versus the Binomial distribution • When n is large and π is not too close to 0 or 1, then the Binomial distribution becomes very similar to the Normal distribution with the same expectation and variance. • This is a phenomenon that happens for all distributions that can be seen as a sum of independent observations. • It can be used to make approximative computations for the Binomial distribution. The Exponential distribution • The exponential distribution is a distribution for positive numbers (parameter λ): f (t ) e t • It can be used to model the time until an event, when events arrive randomly at a constant rate E (T ) 1/ Var (T ) 1/ 2 Sampling • We need to start connecting the probability models we have introduced with the data we want to analyze. • We (usually) want to regard our data as a simple random sample from a probability model: – Each is sampled independently from the other – Each is sampled from the probability model • Thus we go on to study the properties of simple random samples. Example: The mean of a random sample • If X1,X2,…,Xn is a random sample, then their sample mean is defined as 1 n X X n i 1 i • As it is a function of random variables, it is a random variable. • If E(Xi)=μ, then E ( X ) • If Var(Xi)=σ2, then 2 Var ( X ) n E( X ) 4 Var ( X ) 3.2 /10 0.32 0.15 0.10 0.05 E ( X i ) 20 0.2 4 Var ( X i ) 20 0.2 (1 0.2) 3.2 0.00 • Assume X1,X2,…,X10 is a random sample from the binomial distribution Bin(20,0.2) • We get 0.20 Example 0 5 10 15 20 Simulation • Simulation: To generate outcomes by computer, on the basis of pseudo-random numbers • Pseudo-random number: Generated by an algorithm completely unrelated to the way numers are used, so they appear random. Usually generated to be uniformly distributed between 0 and 1. • There is a correspondence between random variables and algorithms to simulate outcomes. Examples • To simulate outcomes 1,2,…,6 each with probability 1/6: Simulate pseudo-random u in [0,1), and let the outcome be i if u is between (i-1)/6 and i/6. • To simulate exponentially distributed X with parameter λ: Simulate pseudo-random u in [0,1), and compute x=-log(u)/λ 35 10 0.4 15 20 0.6 25 0.8 30 n=100 0 5 0.2 0.0 2 4 6 8 0 10 1 2 3 4 5 300 40000 400 0 n=100000 10000 100 20000 200 30000 n=1000 0 0 The histogram of n simulated values will approach the probability distribution simulated from, as n increases 1.0 Stochastic variables and simulation of outcomes 0 2 4 6 8 0 2 4 6 8 10 Using simulation to study properties of samples • We saw how we can find theoretically the expectation and variance of some functions of a sample • Instead, we can simulate the function of the sample a large number of times, and study the distribution of these numbers: This gives approximate results. 20000 10000 5000 0 Frequency • X1,X2,…,X10 is a random sample from the binomial distribution Bin(20,0.2) • Simulating these 100000 times, and computing X , we get • The average of these 100000 numbers is 4.001, the variance is 0.3229 30000 Example 2 3 4 5 6 7 Studying the properties of averages • If X1,X2,…,Xn is a random sample from some distribution, it is very common to want to study the mean • In the following example, we have sampled from the Exponential distribution with λ parameter 1: – First (done 10000 times) taken average of 3 samples – Then (done 10000 times) taken average of 30 samples – Then (done 10000 times) taken average of 300 samples 4000 1.0 Average of 3 2000 0.0 0 0.2 1000 0.4 Frequency 0.6 3000 0.8 Exp. distr; λ=1 1.0 1.5 2.5 0 3.0 1 2 3 4 5 Average of 300 2500 1500 500 500 1000 1000 Frequency 2000 1500 Average of 30 0 0 Frequency 2.0 3000 0.5 2000 0.0 0.5 1.0 1.5 0.8 0.9 1.0 1.1 1.2 The Central Limit Theorem • It is a very important fact that the above happens no matter what distribution you start with. • The theorem states: If X1,X2,…,Xn is a random sample from a distribution with expectation μ and variance σ2, then X Z / n approaches a standard normal distribution when n gets large. Example • Let X be from Bin(n,π): • X/n can be seen as the average over n Bernoulli variables, so we can apply theory • We get that when n grows, the expression x n (1 ) / n gets an approximate standard normal distribution N(0,1). • A rule for when to accept the approximation: n (1 ) 9 The sampling distribution of the sample variance • Recall: the sample variance is S i 1 ( X i X )2 • We can show theoretically that its expectation is equal to the variance of the original distribution • We know that its distribution is approximately normal if the sample is large • If the underlying distribtion is normal N(μ,σ2): 2 4 2 – Var ( S 2 ) n 1 2 ( n 1) S – is distributed as the 2 2 n 1 1 n 1 n distribution The Chi-square distribution 0.05 0.10 0.15 2 4 0.00 • The Chi-square distribution with n degrees of freedom is 2 denoted n • It is equal to the sum of the squares of n independent random variables with standard normal distributions. 0 2 4 6 8 10 Estimation • We have previously looked at – Probability models (with parameters) – Properties of samples from such probability models • We now turn this around and start with a dataset, and try to find a probability model fitting the data. • A (point) estimator is a function of the data, meant to estimate a parameter of the model • A (point) estimate is a value of the estimator, computed from the data Properties of estimators • An estimator is unbiased if its expectation is equal to the parameter it is estimating • The bias of an estimator is its expectation minus the parameter it is estimating • The efficiency of an unbiased estimator is measured by its variance: One would like to have estimators with high efficiency (i.e., low variance) Confidence intervals: Example • Assume μ and σ2 are some real numbers, and assume the data X1,X2,…,Xn are a random sample from N(μ,σ2). – Then X ~ N (0,1) / n P(1.96 Z 1.96) 95% Z – thus P ( X 1.96 X 1.96 ) 95% – so n n ( X 1.96 , X 1.96 ) is a and we say that n n confidence interval for μ with 95% confidence, based on the statistic X Confidence intervals: interpretation • Interpretation: If we do the following a large number of times: – We pick μ (and σ2) – We generate data and the statistic X – We compute the confidence interval then the confidence interval will contain μ roughly 95% of the time. • Note: The confidence interval pertains to μ (and σ2), and to the particular statistic. If a different statistic is used, a different confidence interval could result. Example: a different statistic • Assume in the example above we use instead of Z X . Z0 / n • We then get Z0 ~ N (0,1) as before, and the confidence interval ( X1 1.96 , X1 1.96 ) • Note how this is different from before, as we have used a different statistic. X1 Alternative concept: Credibility interval • The knowledge about μ can be formulated as a probability distribution • If an interval I has 95% probability under this distribution, then I is called a credibility interval for μ, with credibility 95% • It is very common, but wrong, to interpret confidence intervals as credibility intervals Example: Finding credibility intervals • We must always start with a probability distribution π(μ) describing our knowledge about μ before looking at data • As above, the probability distribution g for Z|μ is the normal distribution N(μ,σ2/n) • Using Bayes formula, we get a probability distribution f for μ|Z: g ( Z | ) ( ) f ( | Z ) P( Z ) Finding credibility intervals (cont.) • IF we assume ”flat” knowledge about μ before observing data, i.e., that π(μ)=1, then | Z ~ N ( X , 2 / n) and a credibility interval becomes ( X 1.96 , X 1.96 n n ) • Similarily, if we assume π(μ)=1 and only observe X1, then a credibility interval becomes ( X 1.96 , X 1.96 ) Summary on confidence and credibility intervals • Confidence and credibility intervals are NOT the same. • A confidence interval says something about a parameter AND a random variable (or statistic) based on it. • A credibility interval describes the knowledge about the parameter; it must always be based also on a specification of the knowledge before making the observations, as well as the observations • In many cases, computed confidence intervals correspond to credibility intervals with a certain prior knowledge assumed.