Transcript random

Random Sampling
Approximations of E(X), p.m.f,
and p.d.f
Important
• Read through simulation slides for Thursday
• Homework #8 is due on Thursday
• Check web-page on Wednesday night -- print off
any worksheets for simulation that might be there
for Thursday
• Major mistakes on study guide put on-line; there is a
new one there now
– A different definition of the p.d.f and c.d.f for the
uniform random variable then what was given in class but
they mean the same thing.
• P.M.F versus P.D.F – need clarification because I
mispoke
P.M.F versus P.D.F
• Either graph can be a histogram
– I was assuming that the bin width will always be 1 for a
finite random variable but that is not necessarily the case
• Take X = 0, ½, 1, etc.
• Probability Mass Function
– The values along the y-axis of a histogram represent
probabilities
• If you sum up the probabilities, they should add up to 1 every
time (regardless of the bin width)
• Thus, to determine is a graph is a p.m.f, you need to add up the
heights of the rectangles – if they add up to 1, then it is a p.m.f.
P.M.F versus P.D.F
• A probability density function can also be a
histogram
• The values along the y-axis do not represent
probabilities for a continuous random variable
• However, the area under graph must be equal to 1
– How can you check if a histogram represents an p.d.f ?
If the heights of the rectangles do not add up to 1, but
the areas of the rectangles do sum to 1.
In conclusion
• Both a p.m.f and p.d.f graph can be histograms
– To tell if a histogram represents a p.m.f, the sum of the
HEIGHTS of the rectangles must equal 1 (because the
heights represent probabilities)
– To tell if a histogram represents an p.d.f, the sum of the
heights of the rectangles do not equal but the area of the
rectangles do.
• Let’s look at number 4b from the homework just
turned in ….
Why do we use Random
Sampling?
• In business, we identify a random variable
• We want its probability information
• Problem: We do not know its distribution
OR expected value
• Solution: Estimate E(X) and estimate FX(x)
and fX(x) using random sampling
Definitions
• A number x that results from a trial of the
process is called an observation of X
• A set {x1, x2, ……, xn} of n independent
observations of the same random variable X
is called a random sample of size n.
Example #1
• Suppose that X is the number of assembly
line stoppages that occur during an 8-hour
shift in a manufacturing plant. We could
obtain a random sample of size 10 by
watching the line for 10 different shifts and
recording the number of stoppages during
each 8-hour shift
Example #1 (continued)
Shift observed
Number of stoppages
1
2
2
11
3
6
4
8
5
6
6
5
7
10
8
4
9
8
• Looking above, we see the information recorded
during the 10 different shift observations
• We can compute the sample mean of the
observations
___
• The sample mean is denoted by x
10
3
Statistics and Probability
• There is a difference between probabilities and
statistics even though people use them
interchangably
• A number that describes a sample
is called a statistic
___
• THEOREM: The statistic x can be used as an
estimate of E(X).
• In general, the larger the sample size n, the better
the estimate will be
Sample Mean
• We can find the mean of example #1
1 n
x 
 xi
n i 1
1
x 
(2  11  6  8  6  5  10  4  8  3)
10
63
x 
 6.3
10
x  E( X )
So, 6.3 is the approximate number of expected
line stoppages during an 8-hour shift
However, 10 observations is a little small to
base an approximation on
Approximating Probability Mass
and Density Functions
• If we have a large enough sample, we can
approximate functions
• I.e., we can approximate a p.m.f or a p.d.f
depending on the random variable
• If we approximate a p.m.f or p.d.f, we can
also look at the corresponding graphs
Example #3
Suppose that the assembly line discussed in
Example #1 runs 24 hours a day, with
workers in three shifts. Observations of the
number of stoppages during an 8-hour shift
were recorded for a nine month period. I.e.,
819 different shifts were observed and
recorded in the file Stoppages.xls.
Relative Frequencies
• Relative frequencies were plotted to obtain
the histogram seen in Stoppages.xls
• The relative frequency of each value X in the
sample gives an estimate for the probability
that X will assume that value. WHY?
• How did we obtain the relative frequencies?
• A histogram will give a good approximation
for the graph of fX
Continuous Random Variables
• A large random sample can also be used to
approximate the p.d.f of a continuous random
variable
• One way we can obtain our p.d.f is by looking at
smaller and smaller bin widths of our data
• Use the HISTOGRAM function in Excel to find the
approximation of the graph of the p.d.f
Example #4
• The manager of the plant that was described
in the the previous examples wants to get a
better of understanding of the delays caused
by stoppagesof the assembly line. So, in
addition to knowing how many stoppages
there are, the manager wants to know how
long they last.
Example #4 (continued)
• Let T be the (exact) length of time, in minutes,
that a randomly selected stoppage will last
• QUESTION: Is T a continuous random
variable?
• In Stoppages.xls, the duration of each
stoppage was also recorded for all 819 shifts
• Therefore, we have a random sample of
observations for T
Example #4 (continued)
• Used the function HISTOGRAM in Excel to plot an
approximation of the p.d.f., fT
• In Stoppages.xls, bin widths of 2 minutes are used
• Since our bin width is 2, to make the area under the graph be
1, we had to divide each relative frequency by 2 and then
plot those “new” relative frequencies
– Note: Here you are dividing the relative frequency by the
bin width – not the frequency by the bin width as stated
in class
• Thus, you find the relative frequency as you did before
and then divide it by the bin width
• By connecting the midpoints of the tops of the rectangles
gives us an approximate curve
Using the approximated p.d.f
• We can use our plot to calculate probabilities
• For example, if we wanted to know
P(2<T4), we could look at the
corresponding area under the graph
• Note: P(2<T4) corresponds to an area
under the graph between (2,4] which is a
rectangle
• So, to find our probability, find the area of
the rectangle
Focus on the Project
• We have a continuous random variable Rnorm which
gives the normalized ratio of weekly closing prices
on Disney stock (class project)
• Option Focus.xls contains 417 values of Rnorm from
417 weekly closing ratios
• They are considered to be independent observations
• Thus, make up a random sample of size 417 for
Rnorm
Focus on the Project
• We can calculate sample mean which we know
should be equal to what?
• We can create a plot using the relative frequencies
• Note: If your bin width is greater than 1, you will
have to divide the relative frequency by your bin
width to make the area under the curve be 1
• Graphing the midpoints at the tops of the bars will
produce a line graph approximation for fnorm
What should you do?
• Plot an approximation of the probability density
function for the normalized ratios of weekly closing
prices
• The plot should be a line graph, where you are
connecting the midpoints of the tops of the bars
• Remember, if your bin width is greater than 1 you will have
to divide the relative frequencies by that width before
you plot
• Find the sample mean of the normalized ratios –
you already know what it should equal