Review session
Download
Report
Transcript Review session
Review Session
Jehan-François Pâris
Agenda
Statistical Analysis of Outputs
Operational Analysis
Case Studies
Linear Regression
How to use this presentation
Most problems have
One
slide stating the problem
One slide explaining how to solve the problem
One slide allowing you to check your answer
You will learn more by trying first to do the
problems on your own than by reading
their solutions
Do not forget either to review the problems
in the original notes
Statistical Analysis
of Outputs
The big picture
The problems
Constructing
confidence intervals
Handling auto correlated data
The tools
Central-Limit
Theorem
Wilson’s formula
Batch means (and regeneration)
RNG tricks
Confidence Intervals
Distinguish between
CIs
for means
CSIM does it for you
CIs for proportions
We are on our own
Major issue is independence of data points
CSIM
uses batch means
Central Limit Theorem
If the n mutually independent random
variables x1, x2, …, xn have the same
distribution, and if their mean m and their
variance s2 exist then …
Central Limit Theorem
The random variable
1 n
xi m
n i 1
s
n
is distributed according to the standard normal
distribution (zero mean and unit variance).
CI for means (I)
For large values of n, the (1-)%
confidence interval for m is given by
z / 2s
z / 2s
,x
x
n
n
with
F ( z / 2 ) 1
2
CI for means (II)
F(z) is taken from a table of the normal
distribution
F(0.025)
For smaller values of n, we have to use
Student’s t random variable
Wider
= 1.96
CIs
We replace s by the sample standard
deviation s
Example
We have
100
observations for the waiting time
xbar = 4.25 minutes
s2 = 25
Example
We have
100
observations for the waiting time
xbar = 4.25 minutes
s2 = 25
Answer is
4.25
± 1.96 sqrt(25/100) = 4.25 ± 0.98
CI for proportions
A proportion represents the probability
P(X) for some fixed threshold
97%
of our customers have to wait less than
one minute
Distributed according to a binomial law
Use
Wilson’s formula
Wilson’s formula
When n > 29, we can use the Wilson’s
interval
z2 / 2
z2 / 2
qˆ (1 qˆ ) z2 / 2
qˆ (1 qˆ ) z2 / 2
qˆ
z / 2
qˆ
z / 2
2
2
2
n
n
2
n
n
4
n
4
n
1
P
q
z2 / 2
z2 / 2
1
1
n
n
where z/2 = 1.96 for a 95% C.I.
Example
We have want to estimate the proportion
of packets that wait more than four slots
400
observations
40 packets waited more than four slots
Answer
Divisor:
1
+ 1.962/400 1.01 (instead of 1.0096)
Central term
0.1
+ 1.962/(2×400) 0.105 (instead of 1.048)
Half width
(0.1×0.9)/400 + 1.962/(2×4002) )
sqrt (0.09/400 + (4/800)/400)
1/20 sqrt (0.09 +0.0025) 0.3/20 = 0.015
sqrt(
Result is
(0.105 ± 0.015)/ 1.01 = 0.104 ± 0.015
Batch means (I)
Simulation data are often autocorrelated
Packet
delays in ALOHA
Waiting times in queues
…
Batch means reduce (but do not
completely eliminate) that effect
Batch means (II)
Group measurements into fixed-size
batches of consecutive data
Compute mean of each batch
If batches are large enough, these means
will be independent
Can
use standard-limit theorem, …
In case of doubt, compute autocorrelation
function for successive batch means
Regeneration (I)
The idea
Partition
simulation data into intervals such
that
Data measured inside the same interval
might be correlated
Data measured in different intervals are
independent
Regeneration (II)
How?
System
goes to a regeneration point each
time
Its queues become empty
All the disk drives are operational
…
Criterion is system specific
Streams
When you want to evaluate two different
configurations of a system, it is often good
idea to use separate random number
streams for arrivals and service times
Arrival
times remain unchanged when we
change other parameters of the system
Operational Analysis
Single server (I)
We can measure
T
the length of the observation period
A the number of arrivals during the
observation period
B the total amount of busy times during the
observation period
C the number of completions during the
observation period
Single server (II)
We can compute
l
= A/T
X = C/T
U = B/T
S = B/C
There are two ways to compute U
U
the arrival rate
the output rate
the utilization
the mean service time
= B/T = (C/T )(B/C) = XS
In general A C and l X
Little’s law
If W is the total time spent by all tasks
inside the system over the observation
period, then
N
= W/T
R
= W/C
Since W/T = (C/T)(W/C) = XR, N = XR
This is important
A problem
An ice-cream parlor
Observed
during 6 hours
Visited by 120 customers
Spend an average of 24 minutes inside
What is the average number of customers
inside the parlor?
Answer
We compute X and apply Little’s Law
Answer
We compute X and apply Little’s Law
X
= 120/6 = 20 customers/hour
R = 24 minutes = 0.4 hours
N = XR = 8 customers
If you did not get it
The 120 customers sent a total of 120×24
customer×minutes or 48 customer×hours
in the parlor
48
customer×hours/6 hours = 8 customers
Same as having 8 customers spending six
hours each inside the parlor
Network of servers (I)
Open network
Arrivals
Departures
Network of servers (II)
Closed network
Arrivals
Departures
Operational Quantities
Keep same quantities as before but add
indices
0
for whole system
k for individual servers
Two changes
We
never care about the utilization of the
whole system
We add number of visits Vk of each server
Operational quantities
Over the observation period, we measure
C
= the number of job completions
Ck = the number of tasks completed by
device k
We define
X0
= C/T = the system throughput
Xk = Ck/T = the output rate at server k
Vk = Ck/C = the visit count at server k
Important relationships
Ck = VkC
Since
each job requires Vk visits, there are Vk
more server completions than job completions
Xk = Vk X0
Same
property applies to throughputs
ni
System response time (I)
We define
Nbar = average number of jobs in the
system
nbari = average number of jobs at device i
Nbar = Σi nbari
System response time (II)
Applying Little’s law, we have
R = Nbar/X0
and
nbari = RiXi = RiViX0
Hence
R = Σi ViRi
Note
This result is trivial
The
total time spent by a job in the system is
the sum of the times spent at each server
This includes the time spent waiting in the
server queues
Problem 1
A job requires
100
ms of CPU time
9 disk accesses
Each disk access takes 7 ms
We want
VCPU
and SCPU
Answer
We now that jobs get CPU first and last
VCPU
= 10
Then
SCPU
= 100/10 =10s
Bottleneck analysis (I)
A system has one CPU and one disk drive
It processes transactions such that
VCPU =
12 and SCPU = 5ms
VDisk =
11 and SDISK = 8ms
What is the maximum system throughput?
Bottleneck analysis (II)
We compute first the maximum device
throughputs
Maximum XCPU = 1/0.005 = 200 requests/s
Maximum Xdisk = 1/0.008 = 125 requests/s
Since Xi = Vi X0
Maximum
throughput compatible with CPU
workload is 200/12 = 16.7 transactions/s
Maximum throughput compatible with disk
workload is 125/11 = 11.4 transactions/s
Bottleneck analysis (III)
The disk is this the bottleneck
It
has highest ViSi product
Identifying feature of any bottleneck device
Increasing the system throughput might
require
Sharing
disk requests with a second disk
Increasing the efficiency of the system I/O
buffer
Problem 2
In the previous example, which device was
the bottleneck?
What would be the throughput of the
system if the bottleneck utilization was
80%?
Answer
We compare
VCPUSCPU
VdiskSdisk
Answer
We compare
VCPUSCPU
= 100ms
VdiskSdisk = 9×7 = 63 ms
The CPU is the bottleneck
Answer
If the bottleneck was operating at 100%
utilization,
It
could process one job each VCPUSCPU time
units
Or 1/(VCPUSCPU) job per time unit
At UCPU utilization,
It
will process UCPU/(VCPUSCPU) job per time
unit
Answer
X0 = UCPU/(VCPUSCPU) = 0.80/0.10 seconds
8
jobs/second
Systems with terminals
Whole
system
M Terminals
Interactive response time
formula
We have
M terminals
Think time Z between the completion of a
job and the submission of the next job
Applying Little’s law to the whole system
M = (R + Z ) X0
then
R = M/X0 – Z
Very
Important
Problem 3
We have
M
= 50 users
Z = 20 s
X0 = 2 transactions/s
What is the system response time?
Answer
We apply R = M/X0 – Z
Answer
We apply R = M/X0 – Z and obtain
R = 50/2 – 20 = 5 seconds
Problem 4
A system
Processes
5 transactions/seconds
Has 60 users
Achieves a response time of 4 seconds
What is the think time?
Answer
We apply R = M/X0 – Z,
Z = M/X0 – R
Answer
We apply R = M/X0 – Z,
Z = M/X0 – R = 60/5 – 4 = 8 seconds
Problem 5
We have
M
= 50 users
Z = 20 s
R = 4 s
What is the system throughput?
Answer
From R = M/X0 – Z, we have
X0 = (R + Z)/M
Hence X0 = (20 + 4)/50 = 0.48 tasks/s
Problem 6
A system
Can
process up to 4 transactions/second
Has 60 users
User think time is 12 seconds
Can the system achieve a response time
of 2 seconds?
Answer
Applying R = M/X0 – Z, we compute a
lower bound for the response time
Rmin = M/X0,max – Z
Answer
Applying R = M/X0 – Z, we compute a
lower bound for the response time
Rmin = M/X0,max – Z = 60/4 – 12 = 3 seconds
Answer is no
Problem 7
Compute the response time of a system
knowing the following parameters
M
= 50 users
Z
= 15 s
VCPU
SCPU = 200ms
UCPU
= 50%
Answer
Since Xk = Uk /Sk and Xk = VkX0,
X0 = Uk /(VkSk)
The response time is then given by
R = M/X0 – Z
Answer
Let us compute first the throughput X0
Applying X0 = Uk/(VkSk)
X0 = 0.50/0.200 = 2.5 interactions/s
The response time is then
R = M/X0 – Z = 50/2.5 – 15 = 5 s
Simulation
Case Studies
A simple reminder
If interarrival times are
Independent identically distributed
(i. i. d.)
According to an exponential law
then the probability of having exactly n
arrivals during a fixed interval is distributed
according to a Poisson law
Explanation (II)
Assume that
The
probability of one arrival during a small
interval Dt is lDt
The probability of two arrivals during the same
small time interval is negligible
lDt
lDt
lDt
lDt
lDt
lDt
Explanation (I)
The probability of having exactly k arrivals
during n slots is
n
k
nk
(lDt ) (1 lDt )
k
What would happen if the number of time
intervals goes to infinity while their total
duration T = nDt remains constant
Explanation (III)
We rewrite the previous expression as
n!
lT k
lT n k
( ) (1
)
k!(n k )! n
n
n!
(lT )
lT n
lT k
(1
) (1
)
k
n (n k )! k!
n
n
k
and compute separately the limits of its
four factors
Explanation (IV)
lim n
n!
n(n 1)...( n k 1)
1
k
k
n (n k )!
n
(lT )
remains unchanged
k!
lT n
lT
lim n (1
) e
n
lT k
lim n (1
) 1
n
k
Explanation (V)
We obtain the Poisson distribution
( l T ) lT
e
k!
k
The probability that there are no arrivals
in the same time interval T (or in any time
interval T) is
e
lT
Explanation (VI)
This last expression is the probability that the
time interval between two consecutive arrivals
is greater than T
The probability that the time interval between
two consecutive arrivals is equal or lesser
than T is
1 e
lT
which is the cdf of the exponential distribution
A final observation
Use the Poisson distribution to generate
number of arrivals during a time interval
Use the exponential distribution to
generate interarrival times
Linear Regression
Most important point
Compute a regression line
Compute regression coefficient
Example
x[i]
y[i]
1.5
3
2.4
4.8
3.5
9
4.4
11
5.1 13.7
Linear Regression
We have
one
independent variable
One dependent variable
We must find
Y = + bX
minimizing the sum of squares of errors
Si (yi - - bxi)2
Formulas
b
x y
n x x
ni xi yi
i
i
2
2
i
y bx
i
i
i
i
i
Calculations (I)
Sum
Average
Squared Sum
x[i]
y[i]
x[i]^2 x[i]y[i]
1.5
3 2.25
4.5
2.4
4.8 5.76 11.52
3.5
9 12.25 31.5
4.4
11 19.36 48.4
5.1 13.7 26.01 69.87
16.9 41.5 65.63 165.8
3.38
8.3
285.6
Calculations (II)
Sum
Average
Squared Sum
n
beta
alpha
x[i]
y[i]
x[i]^2 x[i]y[i]
16.9 41.5 65.63 165.8
3.38
8.3
285.6
5
3
-1.84
Outcome
14
12
10
8
6
4
2
1
2
3
4
5
6
More notations
S xx
(x x)
1
x n x
( x x ) ( y y )
2
i
i
2
2
i
i
i
i
S xy
S yy
1
i xi yi i x i i y i
n
2
i ( yi y )
1
2
2
i yi i y i
n
i
i
i
i
More notations (II)
Solution can be rewritten
b
S xy
S xx
y bx
Coefficient of correlation
r
S xy
S xx S yy
bS xx
S xx
b
S yy
S xx S yy
r = 1 would indicate a perfect fit
r = 0 would indicate no linear dependency
Calculations
Sum
Squared Sum
n
Sxx
Syy
Sxy
r
x[i]
y[i]
x[i]^2 y[i]^2 x[i]y[i]
16.9 41.5 65.63 421.7 165.8
285.6 1722
5
8.508
77.28
25.52
0.995