Transcript ppt

Al-Imam Mohammad Ibn Saud University
CS433
Modeling and Simulation
Lecture 12
Output Analysis
Large-Sample Estimation Theory
http://10.2.230.10:4040/akoubaa/cs433/
10 Jan 2009
Dr. Anis Koubâa
Goals of Today
Understand the problem of confidence in
simulation results
 Learn how to determine of range of value with
a certain confidence a certain stochastic
simulation result
 Understand the concept of

 Margin
of Error
 Confidence Interval with a certain level of
confidence
Reading

Required
 Lemmis
Park, Discrete Event Simulation - A First Course,
Chapter 8: Output Analysis

Optional
 Harry
Perros, Computer Simulation Technique - The
Definitive Introduction, 2007
Chapter 5
Problem Statement


For a deterministic simulation model one run will be
sufficient to determine the output.
A stochastic simulation model will not give the same
result when run repetitively with independent random
seed.
 One
run is not sufficient to obtain confident simulation
results from one sample.

Statistical Analysis of Simulation Result: multiple runs
to estimate the metric of interest with a certain
confidence
Example:

The estimation of the mean value of the response
time of M/M/1 Queue (from a simulation or
experiments)
 May
vary from one run to another (depending on the seed)
 Depends on the number of samples/size of samples

Objective
 For
a given large sample output, determine what is the
mean value with a certain confidence on the result.
Stochastic Process Simulation
M. Peter Jurkat
UNM/MPJ CS452/Mgt532 V. Output Analysis
6

Each stochastic variable (e.g., time between arrivals) is
generated by a stream of random numbers from a RNG
beginning with a particular seed value


The same RNG with the same seed, x0, will always
generate the same sequence of pseudo-random numbers


need multiple runs and statistical analysis for stochastic models
(i.e., sampling)
repeated simulation with the same inputs (e.g. time,
parameters) and the same seed will result in identical outputs
For independent replications, we need a different seed
for each replication (‫)تكرار‬
Simulation Sampling
M. Peter Jurkat
UNM/MPJ CS452/Mgt532 V. Output Analysis
7
Each simulation run may yield only one
value of each simulation output (e.g.,
average waiting time, proportion of time
server is idle)
 Need replications to gain statistical
stability and significance (confidence) in
output distribution

Simulation Termination
(Simulation Time)
8

Terminating Simulation: Runs for some duration of time
TE, where E is a specified event that stops the simulation.


Bank example: Opens at 8:30 am (time 0) with no customers
present and 8 of the 11 teller working (initial conditions), and closes
at 4:30 pm (Time TE = 480 minutes).
Non-Terminating Simulation: Runs continuously, or
at least over a very long period of time.
Examples: simulating telephone systems, or a computer network
 Main Objective: Study the steady-state (long-run) properties of
the system, properties that are not influenced by the initial
conditions of the model ⇨ Collecting a Large Sample

Experimental Design Issues
9
What types of parameters to estimate?
In general, a stochastic variable is described by
their probability distributions and parameters.
 For quantitative random variables, the
distributions are described by the mean m and
variance s.
 For a binomial random variables, the location
and shape are determined by p.
 If the values of parameters are unknown, we make
inferences about them using sample information.

Types of Inference

‫ستِ ْنباط استدالل‬
ْ ِ‫ا‬
Estimation:
 Estimating or predicting the value of
the parameter from simulation results.
 “What is (are) the most likely values
of m or p?”
Types of Inference - Example

Examples:
 A consumer wants to estimate the average
price of similar homes in his city before
putting his home on the market.
Estimation: Estimate m, the average home price.

A engineer wants to estimate the average
waiting time in a queue obtained by a
simulation.
Estimation: Estimate m, the average waiting time in the queue.
Estimators
http://en.wikipedia.org/wiki/Estimator
Definitions

In statistics, an estimator is

a function of the observable sample data that is

used to estimate an unknown population parameter (which is called the
estimand);


an estimate is the result from the actual application of the function
to a particular sample of data.
Many different estimators are possible for any given parameter.

Some criterion is used to choose between the estimators, although it is
often the case that a criterion cannot be used to clearly pick one
estimator over another.
http://en.wikipedia.org/wiki/Estimator
Estimation Procedure

To estimate a parameter of interest (e.g., a population mean, a
binomial proportion, a difference between two population means,
or a ratio of two population standard deviation), the usual
procedure is as follows:
1.
Select a random sample from the population of interest (simulation
output, experimental measures, random variables, etc).
2.
Calculate the point estimate of the parameter (i.e. the mean value of
the sample).
3.
Calculate a measure of its variability, often a confidence interval.
4.
Associate with this estimate a measure of variability.
Definitions

There are two types of estimators:
 Point
estimator: It is a single number calculated to
estimate the parameter.
 Example:
average value of a sample.
 Interval
estimator: Two numbers are calculated to
create an interval within which the parameter is
expected to lie.
 An
interval estimation uses the sample data to calculate an
interval of possible values of an unknown parameter.
 The most known forms of interval estimation are:


Confidence intervals (a Frequentist Method)
Credible intervals (a Bayesian Method).
Point Estimators
Properties of Point Estimators

The estimator depends on the sample: Since an estimator
is calculated from sample values, it varies from sample to
sample according to its sampling distribution.

An unbiased estimator is an estimator where the mean of
its sampling distribution equals the real (expected) mean
value of the parameter of interest.

It does not systematically overestimate or underestimate the
target parameter.
Properties of Point Estimators

Among all the unbiased estimators, we prefer the
estimator whose sampling distribution has the
smallest spread (or variability).
Measuring the Goodness of an Estimator

Error of estimation (or Bias) is the distance between
an estimate and the true value of the parameter.
The distance between the bullet and
the bull’s-eye.

In this chapter, the sample sizes are large, so that our
unbiased estimators will have normal distributions.
Because of the Central
Limit Theorem.
The Margin of Error

FACT. For unbiased estimators with normal sampling
distributions, 95% of all point estimates will lie within
1.96 standard deviations of the parameter of
interest.

Margin of error: The maximum error of estimation,
calculated as
1.96  std error of the estimator
Estimating Means and Proportions
For a quantitative population,
Point estimator of population mean μ : x
s
Margin of error (n  30) :  1.96
n
 For a binomial population,
Point estimator of population proportion p : pˆ  x/n
pˆ qˆ
Margin of error (n  30) :  1.96
n
Example 1

A homeowner randomly samples 64 homes similar to his own and
finds that the average selling price is 252,000 SAR with a standard
deviation of 15,000 SAR.
Question: Estimate the average selling price for all similar homes in
the city.
Point estimator of μ : x  252, 000
s
15, 000
Margin of error :  1.96
 1.96
 3, 675
n
64
Example 2
A quality control technician wants to estimate the proportion of soda
bottles that are under-filled. He randomly samples 200 bottles of soda
and finds 10 under-filled cans.
What is the estimation of the proportion of under-filled cans?
n  200
p  proportion of underfilled cans
Point estimator of p : pˆ  x/n  10 / 200  .05
ˆˆ
pq
(.05)(.95)
Margin of error:  1.96
 1.96
 .03
n
200
Interval Estimators
Confidence Interval
Interval Estimation
• Create an interval (a, b) so that you are fairly sure that
the parameter lies between these two values.
• “Fairly sure” means “with high probability”, measured
using the confidence coefficient, 1-a.
Usually, 1-a = 0.90, 0.95, 0.98, 0.99
• Suppose 1-a = 0.95 and
that the estimator has a
normal distribution.
Parameter  1.96SE
Interval Estimation
• Since we don’t know the value of the parameter, consider
Estimator  1.96SE
which has a variable center.
MY
APPLET
Worked
Worked
Worked
Failed
• Only if the estimator falls in the tail areas will the interval fail
to enclose the parameter. This happens only 5% of the time.
To Change the Confidence Level
• To change to a general confidence level, 1-a, pick a value of z
that puts area 1-a in the center of the z-distribution (i.e. Normal
Distribution N(0,1).
Tail area
a/2
a
Confidence
Level
za/2
0.05
0.1
90%
1.645
0.025
0.05
95%
1.96
0.01
0.02
98%
2.33
0.005
0.01
99%
2.58
100(1-a)% Confidence Interval: Estimator  za/2SE
Confidence Intervals for Means and Proportions
 For a quantitative population
Confidence interval for a population mean μ :
x  za / 2
s
n
 For a binomial population
Confidence interval for a population proportion p :
pˆ  za / 2
pˆ qˆ
n
Example 1
A random sample of n = 50 males showed a
mean average daily intake of dairy products
equal to 756 grams with a standard deviation of
35 grams.
Find a 95% confidence interval for the
population average m.
s  756  1.96 35
 756  9.70
x  1.96
50
n
or 746.30  m  765.70 grams.
Example 1
 Find a 99% confidence interval for m, the
population average daily intake of dairy
products for men.
x  2.58
s
 756  2.58
35
 756  12.77
n
50
or 743.23  m  768.77 grams.
The interval must be wider to provide for the
increased confidence that is does indeed
enclose the true value of m.
Example 2
 Of a random sample of n = 150 college students, 104 of
the students said that they had played on a soccer team
during their K-12 years.
Estimate the proportion of college students who played
soccer in their youth with a 98% confidence interval.
104
.69(.31)
pˆ qˆ

 2.33
pˆ  2.33
150
150
n
or .60  p  .78.
 .69  .09
Estimate the Difference between two means
Estimating the Difference between Two Means
 Sometimes we are interested in comparing the means of
two populations.
 The average growth of plants fed using two different
nutrients.
 The average scores for students taught with two
different teaching methods.
 To make this comparison,
A random sample of size n1 drawn from
A random
of size s
n2 2drawn
from
population 1 with
mean μsample
and variance
.
1
1
population 2 with mean μ2 and variance s 22 .
Estimating the Difference between Two Means
 We compare the two averages by making inferences
about m1-m2, the difference in the two population
averages.
 If the two population averages are the same, then
m1-m2 = 0.
 The best estimate of m1-m2 is the difference in the
two sample means,
x1 - x2
The Sampling Distribution of x1 - x2
1. The mean of x1 - x2 is m1 - m 2 , the difference in
the population means.
2. The standard deviation of x1 - x2 is SE 
s 12
n1

s 22
n2
.
3. If the sample sizes are large, the sampling distributi on
of x1 - x2 is approximat ely normal, and SE can be estimated
as SE 
s12 s22
 .
n1 n2
Estimating m1-m2
 For large samples, point estimates and their margin of
error as well as confidence intervals are based on the
standard normal distribution (z-distribution).
Point estimate for m1 - m 2 : x1 - x2
s12 s22
Margin of Error :  1.96

n1 n2
Confidence interval for m1 - m 2 :
( x1 - x2 )  za / 2
s12 s22

n1 n2
Example
 Compare the average daily
intake of dairy products of
men and women using a 95%
confidence interval.
Avg Daily Intakes
Men
Women
Sample size
50
50
Sample mean
756
762
Sample Std Dev
35
30
s12 s22
( x1 - x2 )  1.96

n1 n2
2
2
35 30
 (756 - 762)  1.96

50 50
or -18.78  m1 - m2  6.78.
 - 6  12.78
Example, continued
-18.78  m1 - m2  6.78
• Could you conclude, based on this confidence interval, that
there is a difference in the average daily intake of dairy
products for men and women?
• The confidence interval contains the value m1-m2= 0.
Therefore, it is possible that m1 = m2.You would not want to
conclude that there is a difference in average daily intake of
dairy products for men and women.
Estimating the Difference between Two Proportions
 Sometimes we are interested in comparing the proportion of
“successes” in two binomial populations.
 The proportion of male and female voters who favor a
particular candidate.
 To make this comparison,
A random sample of size n1 drawn from
A random
of size
binomial population
1 with sample
parameter
p1. n2 drawn from
binomial population 2 with parameter p2 .
Estimating the Difference between Two Means
 We compare the two proportions by making inferences about p1p2, the difference in the two population proportions.
 If the two population proportions are the same, then p1-p2 = 0.
 The best estimate of p1-p2 is the difference in the two sample
proportions,
x1 x2
pˆ 1 - pˆ 2  n1 n2
The Sampling Distribution of pˆ1 - pˆ 2
1. The mean of pˆ 1 - pˆ 2 is p1 - p2 , the difference in
the population proportion s.
2. The standard deviation of pˆ 1 - pˆ 2 is SE 
p1q1 p2 q2

.
n1
n2
3. If the sample sizes are large, the sampling distributi on
of pˆ 1 - pˆ 2 is approximat ely normal, and SE can be estimated
as SE 
pˆ 1qˆ1 pˆ 2 qˆ 2

.
n1
n2
Estimating p1-p2
 For large samples, point estimates and their margin of
error as well as confidence intervals are based on the
standard normal distribution (z-distribution).
Point estimate for p1-p2 : pˆ1 - pˆ 2
pˆ1qˆ1 pˆ 2 qˆ 2
Margin of Error :  1.96

n1
n2
Confidence interval for p1 - p2 :
( pˆ1 - pˆ 2 )  za / 2
pˆ1qˆ1 pˆ 2 qˆ 2

n1
n2
Example
Compare the proportion of male
and female college students who
said that they had played sport in
a team during their K-12 years
using a 99% confidence interval.
( pˆ1 - pˆ 2 )  2.58
Youth Soccer
Male
Female
Sample size
80
70
Played soccer
65
39
pˆ1qˆ1 pˆ 2 qˆ2

n1
n2
65 39
.81(.19) .56(.44)
 ( - )  2.58

 .25  .19
80 70
80
70
or .06  p1 - p2  .44.
Example, continued
.06  p1 - p2  .44
• Could you conclude, based on this confidence interval, that there
is a difference in the proportion of male and female college
students who said that they had played sport in a team during
their K-12 years?
• The confidence interval does not contains the value p1-p2 = 0.
Therefore, it is not likely that p1= p2. You would conclude that
there is a difference in the proportions for males and females.
A higher proportion of males than
females played soccer in their youth.
One Sided Confidence Bounds


Confidence intervals are by their
nature two-sided since they
produce upper and lower bounds
for the parameter.
One-sided bounds can be
constructed simply by using a
value of z that puts a rather than
a/2 in the tail of the z distribution.
LCB : Estimator - za  (Std Error of Estimator)
UCB : Estimator  za  (Std Error of Estimator)
How to Choose the Sample Size?
Choosing the Sample Size
The total amount of relevant information in a
sample is controlled by two factors:
- The sampling plan or experimental design:
the procedure for collecting the information
- The sample size n: the amount of information
you collect.
 In a statistical estimation problem, the accuracy
of the estimation is measured by the margin of
error or the width of the confidence interval.

Choosing the Sample Size
1.
2.
3.
4.
Determine the size of the margin of error, B, that you
are willing to tolerate.
Choose the sample size by solving for n or n  n 1  n2
in the inequality: 1.96 SE  B, where SE is a function of
the sample size n.
For quantitative populations, estimate the population
standard deviation using a previously calculated value
of s or the range approximation s Range / 4.
For binomial populations, use the conservative
approach and approximate p using the value p  .5.
Example
A producer of PVC pipe wants to survey wholesalers who buy his product in
order to estimate the proportion of wholesalers who plan to increase their
purchases next year.
What sample size is required if he wants his estimate to be within .04 of the
actual proportion with probability equal to .95?
pq
1.96
 .04
n
.5(.5)
 1.96
 .04
n
1.96 .5(.5)
 n
 24.5
.04
 n  24.52  600.25
He should survey at least 601
wholesalers.
Key Concepts
I. Types of Estimators
1. Point estimator: a single number is calculated to estimate the
population parameter.
2. Interval estimator: two numbers are calculated to form an
interval that contains the parameter.
II. Properties of Good Estimators
1. Unbiased: the average value of the estimator equals the
parameter to be estimated.
2. Minimum variance: of all the unbiased estimators, the best
estimator has a sampling distribution with the smallest standard
error.
3. The margin of error measures the maximum distance between
the estimator and the true value of the parameter.
Key Concepts
III. Large-Sample Point Estimators
To estimate one of four population parameters when the
sample sizes are large, use the following point estimators with
the appropriate margins of error.
Key Concepts
IV. Large-Sample Interval Estimators
To estimate one of four population parameters when the
sample sizes are large, use the following interval estimators.
Key Concepts
All values in the interval are possible values for the unknown
population parameter.
2.
Any values outside the interval are unlikely to be the value of
the unknown parameter.
3.
To compare two population means or proportions, look for the
value 0 in the confidence interval. If 0 is in the interval, it is
possible that the two population means or proportions are
equal, and you should not declare a difference. If 0 is not in
the interval, it is unlikely that the two means or proportions are
equal, and you can confidently declare a difference.
V. One-Sided Confidence Bounds
Use either the upper () or lower (-) two-sided bound, with the
critical value of z changed from za / 2 to za.
1.