15 - Rice University
Download
Report
Transcript 15 - Rice University
Statistics : Statistical Inference
Krishna.V.Palem
Kenneth and Audrey Kennedy Professor of Computing
Department of Computer Science, Rice University
1
Contents
Summary of Statistics Learnt so Far
Statistical Inference
Central Limit Theorem and its implications
Estimation theory
Interval Estimation
What is Confidence Interval?
Tutorial
2
Statistical Inference
The process of making guesses about the truth from a sample
Sample statistics
n
̂ X n
x
i 1
n
n
Truth (not
observable)
(x X
i
ˆ 2 s 2
n)
2
i 1
n 1
Sample
(observation)
*hat notation ^ is often used to indicate
“estitmate”
Population
parameters
N
x
i 1
N
3
Source: K. Cobb, Stanford
N
(x )
2
i
2
i 1
N
Make guesses about
the whole
population
Statistical Inference
Population
(parameters, e.g., and )
select sample at random
Sample
collect data from
individuals in sample
Data
4
Analyse data (e.g.
estimate x, s ) to
make inferences
How close is Sample Statistic to
Population Parameter ?
Population parameters, e.g. and are fixed
Sample statistics, e.g. x, s vary from sample to sample
How close is
x to ?
Cannot answer question for a particular sample
Can answer if we can find out about the distribution that describes
the variability in the random variable
5
Contents
Summary of Statistics Learnt so Far
Statistical Inference
Central Limit Theorem and its implications
Estimation theory
Interval Estimation
What is Confidence Interval?
Tutorial
6
The Central Limit Theorem:
If all possible random samples, each of size n, are taken from any
population with a mean and a standard deviation , the
sampling distribution of the sample means (averages) will:
1. have mean:
x
2. have standard deviation:
7
x
n
3. be approximately normally distributed regardless of the shape
of the parent population (normality improves with larger n).
What is it really saying?
(1) It gives a relationship between the sample mean and
population mean
This gives us a framework to extrapolate our sample results to the
population (statistical inference);
(2) It doesn’t matter what the distribution of the original data
is, the sample mean will always be Normally distributed
when n is large.
This why the Normal is so central to statistics
8
Example: Toss 1, 2 or 10 dice
(10,000 times)
Toss 1 dice
Histogram of
data
Distribution of data
is far from Normal
9
Toss 2 dice
Histogram of
averages
Toss 10 dice
Histogram of averages
Distribution of averages approach Normal as
sample size (no. of dice) increases
Central Limit Theorem
(3) It describes the distribution of the sample mean
The values of
x obtained from repeatedly taking samples of size n
describe a separate population
The distribution of any statistic is often called the sampling
distribution