Performance Engineering

Download Report

Transcript Performance Engineering

Performance
Engineering
MEASUREMENT AND STATISTICS
Prof. Jerry Breecher
1
Measurement and Statistics
In order to get you in the mood for doing some measuring, statistics, and estimating,
here are some quotations with the right flavour:
"Figures don't lie, but liars figure." Mark Twain
"There are three kinds of untruths; lies, damn lies, and statistics." Mark Twain
The following are from "Policy Paradox and Political Reason" by Deborah Stone.
"Numerals hide all the difficult choices that go into a measurement."
"Certain kinds of numbers, big ones, numbers with decimal points, ones not multiples of ten,
seemingly advertise the prowess of the measurer."
"How accurate a number is depends on the cost of acquiring it and on how important it is."
2
Measurement and Statistics
"Numbers are a form of poetry. Symbols are another."
"No number is innocent, for it is impossible to count without making categorization."
"Every number is a political statement about where to draw the line."
"The first number you measure becomes the status quo."
3
Measurement and Statistics
Purpose:
This section is about the methodology of measurement. What goes into designing an experiment,
gathering some numbers, interpreting the results, and presenting those results to management in
a way that allows them to make the necessary decisions.
Warm-up Experiment:
Divide into teams and measure the length of an object in the classroom. To do so you will need to
make team decisions about tools, techniques, and reporting metrics.
Upon completion, discuss what can be learned from this experiment.
4
Measurement and Statistics
FUNDAMENTAL QUESTIONS ABOUT MEASUREMENT:
What kind of accuracy can you expect from a computer (or any other)
measurement?
When you make a measurement, can you believe the result? How sure are
you of the result?
How should you state the result of an experiment? How do you reflect your
belief in its accuracy?
Can one number represent the performance of a product?
When have you measured enough?
Figures don't lie, but liars figure. How do you extrapolate from what you
know to what you'd like to know?
5
Measurement and Statistics
FUNDAMENTAL QUESTIONS ABOUT MEASUREMENT:
How do you know what tools to use?
Is everything in a computer measurable?
How do you know what to measure?
Should you always know the result of a measurement before you make it?
How do you figure out dependencies; how does one variable depend on
another?
So after all this talk about the details of measurement, how do you actually
design an experiment?
6
Measurement and Statistics
1. What kind of accuracy can you expect from a computer (or any other)
measurement?
Associated Questions are:
•
•
•
What are some sources of uncertainty when measuring a computer and its software.
Is a computer deterministic? (What is the meaning of deterministic? Do a detour on predictable,
deterministic, stochastic and chaotic.)
What are the pros and cons of taking all the variation out of an environment. Repeatability vs. believability.
Here are some factors that lead to experimental variation:
•
•
•
•
System/Component/Molecule/Atom – how granular is the measurement.
Background Activity
End effects and incomplete cycle effects. Measurement error.
Randomness doesn't mean equality (stochastic process).
Example: Travelling around a monopoly board.
•
Randomness from resource contention ( stochastic process ).
Example: Six processes do nothing but read randomly from a single disk. Do they each make
approximately the same number of accesses after 1 second? 1 minute? 1 hour? 1 day?
7
Measurement and Statistics
1. What kind of accuracy can you expect from a computer (or any other)
measurement?
Here are some factors that lead to experimental variation (continued):
•
Changing hardware.
Example: Variations in fullness of a disk, CPU boards, interrupt traffic.
•
Tool granularity
Example: Our experiment in class.
Example: You write a program that measures time in seconds. What percentage accuracy can
you get from your experiment.
Example: You want to measure the time required to execute a routine and have available a system
call named get_time_of_day. get_time_of_day returns time in units of 1/65535 seconds = 16
microseconds. The time required to execute the get_time_of_day routine itself is 100
microseconds. What is the shortest routine that can be measured with this tool? How would
you do it?
Bottom Line: Never believe a real system number to better than 5 - 10%. Artificial numbers can
sometimes be repeated to 1 - 2%, but are susceptible to spurious factors.
8
Measurement and Statistics
2. When you make a measurement, can you believe the result? How
sure are you of the result? ?
Suppose you make several determinations of some measure. If you can answer yes to
the following questions, then you can have some faith in your measurement:
•
Can you explain why the numbers vary? (“Handwaving” isn't allowed here, but “statistics”
may be a valid answer.)
•
If variations are greater than 10%, can you figure out what's causing the variation and could
you eliminate it if time allowed?
•
If the granularity of your tool is greater than the measurement variations, is that acceptable?
(Your granularity then becomes your uncertainty.)
But How Much Do You Trust It?
To answer this we need a brief digression into some math.
Suppose we've taken a number of measurements
m1 , m2 ,...mn
9
Measurement and Statistics
2. When you make a measurement, can you believe the result? How
sure are you of the result? ?
Then the mean and standard deviation are:
M  Em  1/ ni1 mi
n
SD   E[m 2 ]  E[m]2
0.5 
2
(
M

m
)

i
i
{N  1}
s = s2 = variance = SD2
The first form of the Standard Deviation is the form of the underlying data. The second form is that of
the measured data. They are the same for an infinite amount of data and close enough for a large set of
numbers.
NOTE: Use of these equations assumes that the measurements are independent of each other.
10
Measurement and Statistics
2. When you make a measurement, can you believe the result? How
sure are you of the result? ?
Confidence Intervals:
We'd like to say  “I'm p% sure that with
n samples the actual value is within
d
of
the
mean
of
the
measurements.” In this section, we
develop simple ways to be able to
make that statement.
Example of Standard Deviations using
Normal Distributions:
By quoting the standard deviation of a
measurement, we say we're 68%
sure the true mean is within a
standard deviation of the measured
mean. Unfortunately, that 68%
depends on having a large number
of samples. For smaller numbers,
the percentage will change.
Normal distribution showing mean and variance.
11
Measurement and Statistics
2. When you make a measurement, can you believe the result? How
sure are you of the result? ?
Distributions: Student-T
Both the normal and Student-T
distributions represent how
random data should be
found. The difference lies in
how many samples are
taken; the Normal Distribution
assumes a very large (like
infinite) number of samples,
while the Student T is for n
(less than infinite) samples.
As you see in the examples
on subsequent pages, n is
used
as
part
of
the
confidence calculation.
T distribution showing dependence only on number of samples.
The derivation of the t-distribution was first published in 1908 by William Sealy Gosset, while he worked at a
Guinness Brewery in Dublin. He was not allowed to publish under his own name, so the paper was written
under the pseudonym Student. The t-test and the associated theory became well-known through the work of
R.A. Fisher, who called the distribution "Student's distribution".
12
Measurement and Statistics
2. When you make a measurement, can you believe the result? How
sure are you of the result? ?
The Burns Co. is now making laptop computers in its Shelbyville plant. Mr. Burns is too cheap
to wreck too many computers in a test, so he's letting his QA guru, Homer, smash five of
them. Homer is to record from how high in the air he can drop each laptop on the floor
before it won't work anymore.
Mr. Burns' wants laptops that can survive a fall from his height of five feet, two The t-test will
tell us if we can accept that the average breaking point for a Burns laptop is greater than
5'2", given what we know about the sample.
Let's say the five computers broke at drops of:
–
–
–
–
–
4 feet, 8 inches
5 feet, 1 inch
2 feet, 3 inches
6 feet, 10 inches
7 feet, 1 inch
13
Measurement and Statistics
2. When you make a measurement, can you believe the result? How
sure are you of the result? ?
Using the formula:
t
=
(avg. of sample) - (presumed avg. of larger pop.)
-------------------------------------------------(st. dev. of sample) / (sq. root of sample size)
•
we get an average breaking height of 62.2 inches, St Dev of 23.4, and a t-score of
0.0191.
•
Let's go to the t-score table. There we find the t-value for four degrees of freedom and a
90-percent confidence interval (that's p=.05, since taking .05 off each side of the bell
curve leaves us with .90 in the middle). That value is 2.13.
•
Since the value we calculated is less than the table's t-value, that means we cannot
accept the assumption that all Burns laptops together have an average breaking drop of
over 62 inches. Even though our sample's average came in (just) over that.
14
Measurement and Statistics
2. When you make a measurement, can you believe the result? How sure are
you of the result? ?
Example – Use of Student-T:
As part of our ongoing regression test package, monitoring the performance of PRODUCT X, we
run tests that tickle a number of code paths. In this table, higher numbers are better - they
represent the number of transactions completed – they are throughput.
RESULTS
Model -->
Product X, Version A
Product X, Version B
110
3.25
3.20
120
6.34
6.30
130
9.37a
9.22d
140
11.8b
11.8e
150
14.3
14.4f
160
16.6c
16.8
Here are the raw numbers which went into making up the averages indicated above:
a
9.36
9.37
9.38
9.35
9.38
b
11.76
11.80
11.79
11.77
11.85
c
16.59
16.59
16.58
16.63
16.66
d
9.21
9.22
9.20
9.22
9.23
e
11.83
11.82
11.85
11.82
11.88
f
14.40
14.29
14.43
14.36
14.44
15
Measurement and Statistics
2. When you make a measurement, can you believe the result? How sure are
you of the result? ?
Example – Use of Student-T:
Let's work through in detail the numbers in "f". We find the
mean = (14.40 + 14.29 + 14.43 + 14.36 + 14.44 )/5 = 14.38
SD = SQRT( (.02 + .09 + .05 + .02 +.06 )/4 ) = SQRT( 0.00375 ) = 0.061
s = variance = SD2 = 0.00375
Suppose we want to find the confidence interval for 95% confidence. With 5 variables, we have n =
4 degrees of freedom. Read the table for t(0.975) ( there's 2.5% UNconfidence on each side of
the curve ) giving 2.78.
d = t * SQRT( s / n ) = 2.78 * SQRT( 0.00375 / 5 ) = 2.78 * 0.027 = 0.075
The number is 14.38 +- 0.075 with 95% confidence. (How should you round off this number to
accurately reflect your confidence?)
16
Measurement and Statistics
Example
of
Distribution:
Normal
Suppose we’ve been making
measurements as shown
in the first column
in the Table below.
By
inserting
those
numbers in Excel, the
spreadsheet
will
calculate all kinds
of
things
for
us
automatically.
Measurements
(Sorted)
Mean
1.9
3.90
Standard Deviation
<-- From Excel's
Functions
0.95
2.7
2.8
1.9
<-- From Tools->
2.8
Data_Analysis->
2.8
Mean
3.961290323
Descriptive Statistics
2.9
Standard Error
0.159749631
3.1
Median
3.9
(Note: Excel has
3.1
Mode
2.8
eliminated the
3.2
Standard Deviation
0.889448301
outlying value.)
3.2
Sample Variance
0.79111828
3.3
Kurtosis
-0.674508941
3.4
Skewness
0.419790238
3.6
Range
3.2
3.7
Minimum
2.7
3.8
Maximum
5.9
3.9
95% Confidence
0.32
4.1
4.1
Etc.
etcetera
17
Measurement and Statistics
2. When you make a measurement, can you believe the result? How sure
are you of the result?
COMPARING TWO SETS OF MEASUREMENTS:
You’ve just measured the Performance of the latest release of your product. The numbers are better
than they were when you measured them on the last release. But what does “better” mean.
How do you show that two sets of numbers, with lots of uncertainty in each of the sets, really
have one set better than the other.
First of all, here’s the easy way. With your two sets, calculate their means and their confidence
intervals (the % confidence you use is up to you.) Visually plot these results as show in the
three examples below:
A
A. Here the confidence
intervals don’t overlap. The
results are different from each
other.
B
The results are such that
the mean of one set is
within the confidence
interval of the other set.
The two sets are NOT
different.
C
The confidence intervals
overlap but the means are not
inside the CI of the other set.
Need to do a more complex
test.
18
Measurement and Statistics
2. When you make a measurement, can you believe the result?
How sure are you of the result?
COMPARING TWO SETS OF MEASUREMENTS:
In essence this is a way to combine the confidences for the two data sets
so as to determine the confidence in the difference between the two
sets. This is called a t-test.
Excel can do a t-test as shown in the data below:
Data Set
1
5.36
16.57
0.62
1.41
0.64
7.26
Data Set
2
19.12
3.52
3.38
2.50
3.60
1.74
5.31
6.16
5.64
6.64
0.465703
<-- Average
=AVERAGE(A3:A8)
<-- Standard Deviation
=STDEV(A3:A8)
<-- Result of the t-test says there is a 46%
chance these are from the same distribution
=TTEST(A3:A8,B3:B8,1,1)
19
So for these sets of data, the answer is inconclusive. We can’t tell if there’s a significant difference between the data sets.
Measurement and Statistics
2. When you make a measurement, can you believe the result?
How sure are you of the result?
CHECKING A SERIES OF VALUES:
We'd like to know if a series of values matches a predicted distribution. In other words, we have a theory of what an
experiment should give - do the results in fact match the theory? Chi-Squared tables are available for this
purpose.
Calculate Chi - Squared
 
2
(On  En ) 2
En
where O = Observed and E = Expected.
20
Measurement and Statistics
2. When you make a measurement, can you believe the result?
How sure are you of the result?
CHECKING A SERIES OF VALUES:
Example:
Suppose a random number generator is invoked 200 times and produces values shown in this table:
Range
0.0 - 0.1
0.1 - 0.2
0.2 - 0.3
0.3 - 0.4
0.4 - 0.5
0.5 - 0.6
0.6 - 0.7
0.7 - 0.8
0.8 - 0.9
0.9 - 1.0
Number of Values
23
22
19
15
22
21
20
16
21
21
Plugging this into the equation gives:
 2  (32  22  12  52  22  12  02  42  12  12 ) / 20
 (9  4  1  25  4  1  0  16  1  1) / 20
 62 / 20  3.1
There are nine degrees of freedom. From the chi-squared distribution at this
same website.
Look along the 9-degree row and find that 3.1 is between 3.325 (0.050) and
2.700 (0.025) - interpolated as approximately 0.040.
We can reject the hypothesis the results are the same with a probability of
about 4%. Conversely, we can be 96% sure the distribution is uniform.
Exercise:
Do this same calculation using the Chi Squared Function in Excel.
21
Measurement and Statistics
3. How should you state the result of an experiment? How do you
reflect your belief in its accuracy?
Pat has developed a new product, "rabbit" about which she wishes to determine performance.
There is special interest in comparing the new product, rabbit to the old product, turtle,
since the product was rewritten for performance reasons. (Pat had used Performance
Engineering techniques and thus knew that rabbit was "about twice as fast" as turtle.) The
measurements showed:
Performance Comparisons
Product
Turtle
Rabbit
Transactions / second
30
60
Seconds/ transaction
0.0333
0.0166
Seconds to process transaction
3
1
Which of the following statements reflect the performance comparison of rabbit and turtle?
o Rabbit is 100% faster than turtle.
o Rabbit is twice as fast as turtle.
o Rabbit takes 1/2 as long as turtle.
o Rabbit takes 1/3 as long as turtle.
o Rabbit takes 100% less time than turtle.
o Rabbit takes 200% less time than turtle.
o Turtle is 50% as fast as rabbit.
o Turtle is 50% slower than rabbit.
o Turtle takes 200% longer than rabbit.
o Turtle takes 300% longer than rabbit.
22
Measurement and Statistics
3. How should you state the result of an experiment? How do you
reflect your belief in its accuracy?
•
The guiding principle in stating a result is to keep it simple.
•
State the accuracy using the same methods we've just discussed.
Deviations, and Confidence Intervals.
•
Include the number of decimal points that reflect the accuracy of your answer. Avoid things
like 7.365 with standard deviation of 2.
•
It goes without saying that reflecting your belief in the accuracy presupposes you’ve done the
experiment correctly. Some simple guidelines:
Use Means, Standard
A.
In my experience, you always do the experiment wrong the first five times. Through experience you
learn to look critically at your result to see if it makes sense. If not, then you go figure out what went
wrong. Usually it’s some parameter that wasn’t controlled.
B.
Only vary one parameter at a time.
C.
Watch out for interactions between parameters. The result of changing one parameter results in some
other parameter changing as well.
D.
Don’t do too many or too few experiments.
E.
Get someone else to check your results – by the time you finish a measurement you have too much
invested in it and are very likely to miss something obvious.
23
Measurement and Statistics
4. Can one number represent the performance of a product?
Answer: No, but you'll be asked to do it anyway.
Preparation For This Section – some definitions:
n
Mean or Expected Value:
Mean    E ( x)   pi xi   xf ( x)dx
i 1
Median
Mode
That value for which there’s an equal probability of being above it and below it.
The most likely value. The value with the highest probability.
Mode
Median
Mean
24
Measurement and Statistics
4. Can one number represent the performance of a product?
Example:
The Performance Group at the XYZ Corporation has developed a synthetic workload
that they feel reflects the kind of computer work done by XYZ's "typical"
customer. This workload is composed of various programs driven by a remote
terminal emulator ( RTE ). The RTE can both initiate programs and log when
the programs complete.
This workload was run last week with results shown in the table:
Results of XYZ Corp Performance Benchmark
Transaction Type
Edit a file
Compile and link a file
Run compiled program
200 disk reads
1000 process reschedules
100 physical page faults
Send and receive mail
TOTAL TIME
NOTE:
Time to complete transaction
14 sec
143 sec
17 sec
6 sec
3 sec
10 sec
57 sec
250 sec
Because all these programs
contention for resources.
are
started
The time reported to management was 250 seconds.
simultaneously,
there
25
is
Measurement and Statistics
4. Can one number represent the performance of a product?
Example:
Questions:
•
Is this a good performance indicator?
•
If yes, then sit and relax a few minutes.
•
If no, how would you express the results of these tests? How might you revamp the tests?
What guidelines can be derived for producing one-number performance metrics?
26
Measurement and Statistics
5. When have you measured enough?
This is really two questions:
a)
When have you measured enough to
management expects at this time?
get
the
accuracy
of
answer
that
This is a matter of setting the correct expectations before you start.
Many
times the answer is in response to a “what if” question – you can get the
appropriate accuracy in one hour.
Other times you’ll need weeks of
design/setup/measurement/analysis to get the expected accuracy.
NOTE: Only a small amount of the total experimental time is in the measurement.
Most time goes for design and elimination of unwanted factors.
So this
question could be stated as “How complicated should an experiment be?”
b)
You
When have you measured enough to get the degree of accuracy you expected
for the experiment?
can use the confidence
confidence is
measures
we
discussed
1
Confidence 
n
before.
In
essence,
27
Measurement and Statistics
5. When have you measured enough?
The relationship between the number of required samples and experimental parameters is:
 100 zs 

n  
 rxmean 
Here
n
z
s
r
xmean
= number of samples required
= the number of deviations of the desired confidence
= Standard Deviation
= The desired accuracy in percent.
= The mean of the measurement
NOTE: See that the more accuracy you want (s), the more measurements you need.
NOTE: If your numbers all come out the same, stop. Measurement uncertainty is not the largest part
of the error in your metric.
28
Measurement and Statistics
6. Figures don't lie, but liars figure. How do you extrapolate from
what you know to what you'd like to know?
Often we need a result that is unmeasurable, or would require eons to determine. Is it legal to guess?
Answer:
Sure - as long as you also estimate the uncertainty of your guess.
Here are a few practice situations that will help you improve your powers of estimation. Remember,
there is no RIGHT answer.
1.
Estimate how many people will come to this class next week. More important than the answer is
the assumptions you use for your answer.
2.
Approximately how many cars were in the parking lot outside this building when you came in
tonight? How many are there now?
3.
What is the probability that you will be killed in a car accident?
4.
I recently saw a lawn service truck that had printed on its side “Over 7 trillion blades cut.” Is this a
reasonable claim for them to make?
29
Measurement and Statistics
6. Figures don't lie, but liars figure. How do you extrapolate from
what you know to what you'd like to know?
5. Here is a comic strip version of an approximation
problem. It contains a model, and then an
estimation of the required parameters in the model.
6. But be careful; sometimes the model doesn’t
work.
30
Measurement and Statistics
7. How do you know what tools to use?
•
We'll do a lot more on tools later, but for right now, the best answer is to measure the simplest
way possible.
•
Usually tools are easier to come by than environments.
•
Make sure the tool is less granular than the required uncertainty.
31
Measurement and Statistics
8. Is everything in a computer measurable?
•
Some electrical signals may not be available.
•
The place to make a measurement is in code not under your control.
•
We have a very poor sense of typical/normal. We don't know what our users typically do
with the machine.
•
The measurement may perturb the system and destroy what we wanted to know.
•
Available measurements may not relate to what I want to know. For instance, which disk
blocks are being accessed by each of the processes on a system.
32
Measurement and Statistics
9. How do you know what to measure?
•
This is the hardest question of all. To know what to measure you must have a picture or
model of your product. Most of the rest of this course will deal with various kinds of
pictures.
•
Often an adequate model is a causal one: first procedure A executes; this causes hardware
B to produce an effect; then interrupt code handles the hardware result; etc.
•
Things to keep in mind include:
•
•
•
•
Interaction between variables – do you expect a change in X to produce a change in
Y? You should have a guess as to the result before you make the measurement.
Changing one variable at a time, and measuring it at 10 different values, can be
extremely wasteful and time consuming.
Change only the variables that matter. If you don’t know, try changing something,
just once, and see what happens
Example: You wish to design an experiment that will measure the time required to execute
a program on various Intel processors.. What parameters would you need to vary to try
different processors and configurations? DESIGN THE TESTS TO BE RUN.
33
Measurement and Statistics
10. Should you always know
before you make it?
•
the
results
of
a
measurement
You should always have a guess so you can tell if your result is way off. That guess
should be the result of a model/theory of how the mechanism you are measuring is
working
34
Measurement and Statistics
11. How do you figure out dependencies; how does one variable
depend on another ?
This whole topic is something called linear regression. It says that if you can plot two variables,
x and y, and there’s a simple relationship between the variables, then you can define the
dependency between them.
Good SIMPLE Model.
Good Complicated Model
BAD Model
A linear regression means that we can fit a curve of the form y = a + bx. The quality of the fit
(error) can be defined as the sum of the y distances between the fitting-curve and the
experimental data.
35
Measurement and Statistics
11. How do you figure out dependencies; how does one variable
depend on another ?
So the “best fit” is defined to be the curve that minimizes the sum of errors squared.
n
n
i 1
i 1
2
2
e

(
y

a

bx
)
i  i
i
n
n
 e   ( y  a  bx )
with the constraint that
i
i 1
i 1
i
i
0
When you solve this, you can immediately determine the values of a and b from the experimental data.
xy  x y

b
 x  nx
1
x  x
n
2
2
and
n
with
i 1
i
and
a  y  bx
1 n
y   yi
n i 1
36
Measurement and Statistics
11. How do you figure out dependencies; how does one variable
depend on another ?
Let’s uses as an example the following pairs of data
(14,2), (16,5), 27,7), (42,9), (39,10), (50,13), (83,20).
We COULD use the equation above to determine a and b.
Or, Excel can be used in the same way and gives the same results.
EXAMPLE OF LINEAR REGRESSION - Using LINEST(B3:B9, A3:A9, True, True)
X
Y
14
2
0.25449
Slope
16
5
0.036
Y-Intercept
27
7
42
9
39
10
50
13
The equation in this case is
Y = 0.036 + 0.25449 X.
37
Measurement and Statistics
(14,2), (16,5), 27,7), (42,9), (39,10), (50,13), (83,20).
Also, if you know what you’re doing, you can use “Tools Data Analysis  Regression” and Excel will
give you all kinds of statistics evaluating the goodness of fit of the straight line. (Note that you may need
to use ToolsOptions to bring in the analysis tools.)
If the model you’re expecting isn’t a straight line, then you’ll need to do more sophisticated
analysis, but the method follows in the same way as we’ve just done.
38
Measurement
and
Statistics
12. So after all this talk about the details of measurement, how do you
actually design an experiment?
We’re going to follow through these steps and recommend that you use them in your
experiments. (These are originally due to Jain.)
1.
State Goals and Define The System
a.
What is it you hope to accomplish? Why is it worth doing?
b.
What is the hardware and software (the system) that you will use to achieve these goals?
2. List Services and Outcomes
a.
For the system you’ve chosen, what are the services provided. For instance, if you’re studying
a disk subsystem, it can absorb data (write) or present you with data (read) or give an error.
b.
By outcomes here are meant very high level statements. The outcome of a disk read is DATA.
It’s not a performance or quantifiable answer expected here.
3. Select Metrics
a.
What are the criteria you want to use to compare performance? This is still not a quantifiable
value, but simply what it is you will measure. This could be a speed metric, or an accuracy
metric.
39
Measurement and Statistics
4. List Parameters
a.
What parameters affect performance? If you’re measuring disks, then the model of disk
determines it’s seek time, it’s rotational latency, etc. This is a system parameter.
b.
The kind of test you use, determined by the workload you use, can also define parameters.
These might be requested IO’s per second, random or sequential blocks, etc.
5. Select Factors to Study
a.
A factor is a parameter that you vary.
b.
So, for the parameters you’ve just listed – all of which you COULD vary, which ones will you
actually modify during the course of the experiment?
6. Select Evaluation Technique
a.
You could do this experiment by modeling. You would mathematically represent the system
under study and modify parameters in this model.
b.
You could do this experiment by simulation. You would write a program that represented
the system. Again you could modify parameters and look at results.
c.
You could do this experiment by measurement. Here you have a real system, drive it with
some kind of workload, and get the results.
d.
In practice, in industry, only measurements are valued. It’s generally cheaper to use the real
system than it is to build a mathematical or simulated system.
40
Measurement and Statistics
7. Select Workload
a.
How will you drive the system under test?
b.
It depends on the Evaluation Technique. With a simulation you may have collected some
data that you can feed into your program.
c.
For a measurement evaluation, you will have some kind of software that drives the system
you’re testing. You will need to find a workload that tickles the parameter of interest to
you.
8. Design Experiments
a.
What experiments will you do to collect the data you want?
b.
This means selecting the actual values to be used as factors. If one of your factors is the
type/model of disk, then how many different disks will you use?
41
Measurement and Statistics
9. Make A Guess What The Result Will Be
a.
Many people take a measurement and say “Oh, that must be right.” The best way to be able
to make that statement is to have understood what should happen and then either get what
you expected or not.
b.
If you get what’s expected, then you can be confident that:
c.
v
You understand a picture of how the system is working.
v
You did your measurements correctly.
If you DON’T get what’s expected, then you can be confident that:
v
You didn’t understand the system and so you need to form a new picture.
v
You did the measurement wrong – there’s some experimental error.
10. Conduct the Measurement, Analyze and Interpret Data
a.
Now actually do the measurement, simulation, or whatever you’ve designed.
b.
It’s rare that you just get a number and you’re all done.
c.
There is always interpretation to be done:
d.
v
What does the data mean?
v
Is this the result I would expect?
There are always statistics to be done:
v
Is the data valid?
v
What is the uncertainty in the measurements?
42
Measurement and Statistics
11. Figure Out What You Want To Talk About
a.
Know your audience. Are they management types (who want only an overview) or are they
technical people (who want all the details.) Proper targeting is important!
b.
Choose from all the data you have, those pieces that are most relevant. Don’t forget to make
it interesting!
12. Present Final Results
c.
As you know, in the real world, it’s not what you do, it’s what others think you do.
d.
Presentation is everything.
43
Measurement and Statistics
BONUS: There are various terms and definitions we never got around
to formally defining. Here they are
Definitions of Measured Data
These are some basic terms to define so we have a common lingo.
Independent Events
Two events are independent if there’s no way that the
occurrence of the first event can have anything to do with the
second event.
Random Variate
A variable that can take on one of a particular set of values
with a specified probability.
Cumulative Distribution Function The CDF maps a given value to the probability that the
variable has a value equal to or less than a.
Fx (a)  P( x  a)
Probability Density Function The deriviative of the CDF
f ( x) 
dF ( x)
dx
Gives the probability of x being in the interval (x1, x2).
x2
P( x1  x  x2 )   f ( x)dx
x1
44
Measurement and Statistics
Definitions of Measured Data
These are some basic terms to define so we have a common lingo.
Probability Mass Function
Mean or Expected Value
The equivalent of the PDF but used for discrete variables.
n
Mean    E ( x)   pi xi   xf ( x)dx
i 1
Variance
A measure of the deviation of the values from the mean.
Var ( x)  s 2  E[( x   )2 ] 
n
 p (x  )   (x  )
2
i
i
i
2
f ( x)dx
i
Standard Deviation
This is another measure of the deviation of values. Represented
by , the square root of the variance.
45
Measurement and Statistics
Definitions of Measured Data
Covariance
Given two random variables x and y with means x and y,
their covariance is
Cov( x, y )  s xy  E[( x   x )( y   y )]  E ( xy)  E ( x) E ( y )
For independent variables, the covariance is 0.
Correlation Coefficient
Another measure of how two variables are interdependent.
s xy2
Correlatio n( x, y)   xy 
s xs y
Median
That value for which there’s an equal probability of being above
it and below it.
Mode
The most likely value. The value with the highest probability.
Normal Distribution
The most commonly used distribution. The sum of a large
number of independent observations from any distribution has a
normal distribution.
46