Lec Notes on Sampling

Download Report

Transcript Lec Notes on Sampling

Introduction to
Biostatistics
(PUBHLTH 540)
Sampling
1
Sampling Distributions
Sampling is a fundamental idea underlying
much of statistics. Statistical inference
commonly involves making statements about
population parameters based on sample
estimates.
Population
N
sample
n
x
s2
m
s2
inference
2
Sampling Distributions
Suppose we take all possible samples
of size n from a population (e.g.
samples of size n = 10)
- For each sample, compute sample
mean, and variance, s2
- We then have a population of
sample means.
3
Sampling Distributions
By examining the distribution of
possible sample means,
• we can study their properties,
such as what we would expect
the sample mean to be, and how
spread out the sample means
are.
Simplest Example:
Simple random sample of
size n=1
4
Example
Suppose a population consists of 4 people
with AIDS. We only know response for a
single randomly selected subject, but want
to guess the average in the population. The
number of hospitalized days for each
person last year was:
ID
1
2
3
4
Days
11
16
12
17
First, what is the population mean and
variance?
5
1
m
N
1
2
s 
N
4
x
i 1
i
 14
4
2
(
x

m
)
 6.5
 i
i 1
How many possible different
samples are there?
# Possible Samples- 4
6
Random Variables
How do we represent a single
random selection from the
population? - need a notationDefine a random variable: X
=represents the
value that we
could see (realize)
upon selection
Typically is
represented by a
Capital Letter
7
Definition of a Random Variable
Random Variable: X
Event
Realized Value (x)
Pick ID=1
11
Pick ID=2
16
Pick ID=3
12
Pick ID=4
17
Probability
¼
¼
¼
¼
•Ingredients:
•List of possible events (mutually
exclusive and exhaustive)
•Value and probability for each
event
8
Properties of Probabilities
• A probability is the long-run relative
frequency of an event occurring.
– the probability of an event is
between 0 and 1
– the sum of probabilities of all mutually
exclusive (and exhaustive events) is 1.
9
Definition of a Random Variable
• Common Terminology: X  x
the realized value of X
is x
Example: Suppose that the selection
of a subject is ID=3 (where x=12).
Then the realized value of X is 12.
Note: This doesn’t mean the random
variable, X, is 12. The realized value
of X is 12.
10
Expected Value: Mean
• What do we expect X to be?
– i.e. What value to you expect X to
have?
– E(X)=?
EX  

P  X  x x
all possibilites
1
1
1
1
E  X   11  16   12   17 
4
4
4
4
 14
 mx
11
Expected Value: Variance
• What is the variance of X?
• i.e. What value2 to you expect
 X  E  X   to have?

s  E  X  E  X  
2
X


all possibilites
2

P  X  x   x  E  X  
2
12
Example of Variance of X
Suppose a population consists of 4 people
with AIDS.
The number of hospitalized days for each
person last year was:
ID
1
2
3
4
Days
11
16
12
17
Suppose we take a simple random
sample (SRS) of n=1. What is the
expected value of X? Var(X)?
13
Computing Expected Values

EX  
P  X  x x
all possibilites

s  E  X  E  X  
2
X


all possibilites
2

P  X  x   x  E  X  
2
14
1
1
1
1
E  X   11  16   12   17 
4
4
4
4
 14
Variance of X
1
1
2
2
s  11  14   16  14  
4
4
1
1
2
2
 12  14   17  14 
4
4
 6.5
2
X
15
Stochastic Model
A stochastic Model is an equation that
includes random variables. There is a
deterministic equation for each realization
of the random variables.
Example: X  m  E
Event
Realized Value (x)
Pick ID=1
11
Pick ID=2
16
Pick ID=3
12
Pick ID=4
17
Deterministic
Equation
11=14-3
16=14-2
12=14+2
17=14+3
16
Stochastic Model
• Note that E is also a random variable. We
can define it by
Random Variable: E
Event
Realized Value (e)
Pick ID=1
-3
Pick ID=2
2
Pick ID=3
-2
Pick ID=4
3
Probability
¼
¼
¼
¼
17
Stochastic Model (additive)
X mE
Random
Variables
where
Constant
EX   m
• This is called an additive model since the
additional term, E , is added to the
expected value
18