15 - Rice University

Download Report

Transcript 15 - Rice University

Statistics : Statistical Inference
Krishna.V.Palem
Kenneth and Audrey Kennedy Professor of Computing
Department of Computer Science, Rice University
1
Contents
 Summary of Statistics Learnt so Far
 Statistical Inference
 Central Limit Theorem and its implications
 Estimation theory
 Interval Estimation
 What is Confidence Interval?
 Tutorial
2
Statistical Inference
 The process of making guesses about the truth from a sample
Sample statistics
n
̂  X n 
x
i 1
n
n
Truth (not
observable)
(x  X
i
ˆ 2  s 2 
n)
2
i 1
n 1
Sample
(observation)
*hat notation ^ is often used to indicate
“estitmate”
Population
parameters
N

x
i 1
N
3
Source: K. Cobb, Stanford
N
(x   )
2
i
2 
i 1
N
Make guesses about
the whole
population
Statistical Inference
Population
(parameters, e.g.,  and )
select sample at random
Sample
collect data from
individuals in sample
Data
4
Analyse data (e.g.
estimate x, s ) to
make inferences
How close is Sample Statistic to
Population Parameter ?
 Population parameters, e.g.  and  are fixed
 Sample statistics, e.g. x, s vary from sample to sample
 How close is
x to  ?
 Cannot answer question for a particular sample
 Can answer if we can find out about the distribution that describes
the variability in the random variable
5
Contents
 Summary of Statistics Learnt so Far
 Statistical Inference
 Central Limit Theorem and its implications
 Estimation theory
 Interval Estimation
 What is Confidence Interval?
 Tutorial
6
The Central Limit Theorem:
If all possible random samples, each of size n, are taken from any
population with a mean  and a standard deviation , the
sampling distribution of the sample means (averages) will:
1. have mean:
x  
2. have standard deviation:
7

x 
n
3. be approximately normally distributed regardless of the shape
of the parent population (normality improves with larger n).
What is it really saying?
(1) It gives a relationship between the sample mean and
population mean
 This gives us a framework to extrapolate our sample results to the
population (statistical inference);
(2) It doesn’t matter what the distribution of the original data
is, the sample mean will always be Normally distributed
when n is large.
 This why the Normal is so central to statistics
8
Example: Toss 1, 2 or 10 dice
(10,000 times)
Toss 1 dice
Histogram of
data
Distribution of data
is far from Normal
9
Toss 2 dice
Histogram of
averages
Toss 10 dice
Histogram of averages
Distribution of averages approach Normal as
sample size (no. of dice) increases
Central Limit Theorem
(3) It describes the distribution of the sample mean
 The values of
x obtained from repeatedly taking samples of size n
describe a separate population
 The distribution of any statistic is often called the sampling
distribution