discrete_prob_dist

Download Report

Transcript discrete_prob_dist

Discrete Probability Distribution
Henry Mesa
Use your keyboard’s arrow keys to move the slides
forward (▬►) or backward (◄▬)
Hit the Esc key to end show
In this slide show we will look at creating a probability distribution. A probability
distribution consists of
1. The sample space
2. The probability of the measurements in that sample space.
Visualizing, or creating the sample space can be the portion that takes the most
time, and I will show you two different examples. One will be used again in chapter
17, and belongs to a very specific discrete distribution called Binomial. The Binomial
distribution is basic, very important, and used often in practice. It is also used to
derive other distribution types. We will create this distribution so you can see the
structure and then understand what this type of distribution can represent.
I will first start with a situation in which every member of the population is equally
likely to be chosen.
Suppose that I have two dice - or I have one die and I am throwing it twice. Notice
this is a situation in which I am sampling twice; a sampling distribution.
This die is fair, with six sides, and the sides are marked with the values of
{1, 2, 3, 4, 5, 6}. And my random variable, D, will be the sum of the two dice.
So, how do I create the sample space here?
Organization is essential, so I am going to create a grid. Since, I only have two dice,
and both die are fair, this will be the easiest method.
Die 1
1
1
2
Die 2 3
4
5
6
2
3
4
5
6
Inside this grid will go the sum of the two dice. Notice that while we can not tell the
difference (if both die are identical) as to which produces a “four” for example, the
two die are physically two separate die, and must be recognized as such as we
create the sample space.
So, now I will put the sum of the result of adding the row value with the column
value, in the grid itself.
1
2
Die 2 3
4
5
6
1
2
3
4
5
6
7
2
3
4
5
6
7
8
Die 1
3 4
4 5
5 6
6 7
7 8
8 9
9 10
5
6
7
8
9
10
11
6
7
8
9
10
11
12
Using the results here I can now create the sample space of the random variable D,
which is the sum of the two die, and I can also find the probability of each outcome.
The sample space of D is {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. Do not confuse this with
the probability of getting those events.
I want to make an important distinction at this moment, and the main issue you will
see in chapters 17 and 18. Again, notice that I am “sampling” twice from the
population, which consists of the values of a die. Furthermore, what am I doing with
the values I sampled? Adding them.
1
2
Die 2 3
4
5
6
1
2
3
4
5
6
7
2
3
4
5
6
7
8
Die 1
3 4
4 5
5 6
6 7
7 8
8 9
9 10
5
6
7
8
9
10
11
6
7
8
9
10
11
12
I know that this may be difficult to grasp, at the moment, what I am trying to convey
with the “sampling and then adding” statement above, but keep in mind that what I
am saying is at the heart of what is Statistics!
Before continuing with the original problem, I want to emphasize what I am trying
to allude to. In Statistics we measure so we can understand some characteristic of a
population. But then once you collect the data, you must summarize this
information (i.e. chapters 4 and 5 provides examples of those summaries). The
purpose of the summaries is to condense all the data to values that can be
interpreted and somehow represent the characteristic that we want to learn more
about. But what we are interested in is the “chance” of getting the values of those
summaries.
Gosh, can you tell that I am pulling my hair trying to pass some important
information to you? We will revisit this in chapter 18 and in the modules after exam
3. Back to the problem.
Notice that every single value on that grid is equally likely because the dice are fair.
So, because there are 36 squares, all equally likely, the sum of “six” (color coded
yellow ) is equally likely compared to the other “six” (color coded blue). All the sixes
in that grid are equally likely to occur, and all are unique (occur because of a unique
roll of the dice).
1
2
Die 2 3
4
5
6
1
2
3
4
5
6
7
2
3
4
5
6
7
8
Die 1
3 4
4 5
5 6
6 7
7 8
8 9
9 10
5
6
7
8
9
10
11
6
7
8
9
10
11
12
Notice that P(six) = 5/36, since there are 36 possibilities all equally likely, and five of
the 36 meet the criteria of “sum of six”.
So, here is the probability distribution, which is created by matching the sample
space with the probability of each outcome.
X
2 3
1 2
P(X)
36 36
4
3
36
5
4
36
6
5
36
7
6
36
1
2
Die 2 3
4
5
6
1
2
3
4
5
6
7
8
5
36
2
3
4
5
6
7
8
9
4
36
Die 1
3 4
4 5
5 6
6 7
7 8
8 9
9 10
5
6
7
8
9
10
11
10
3
36
11
2
36
12
1
36
6
7
8
9
10
11
12
Notice to get the probabilities all I have to do is count how many of that particular
sum appear. If the die was not fair, getting the probabilities would not be as easy.
Here is a different problem in which things are not equally likely and counting will
not work.
At a college 59% of the student body are women. Suppose that four students are
sampled at random. I will define the random variable X, to count how many women
are in the sample of four. Notice this is a scenario where I am sampling four times.
So, my sample space, for the random variable X is… see if you can write down the
sample space on your own.
{X | 0, 1, 2, 3, 4}
Now, I need to find the probability of each of those values. Lets look at the two
easiest value to calculate: P(X = 0) and P(X = 4).
So, P(X = 4) means P(woman AND woman AND woman AND woman). Do I
have independence during my sampling?
Good enough to say we do have independence, since I am assuming this is a college
with a couple thousand students, and I am only sampling four people from that
population. P(woman | woman) ≈ P(woman),
P(woman | woman AND woman AND woman) ) ≈ P(woman),
At a college 59% of the student body are women. Suppose that four students are
sampled at random. I will define the random variable X, to count how many women
are in the sample of four.
So, my sample space, for the random variable X is {X | 0, 1, 2, 3, 4}.
So, P(X = 4) means
P(woman AND woman AND woman AND woman) = (0.59)(0.59)(0.59)(0.59)
= 0.1212
Try and calculate P(X = 0) on your own.
P(Not Woman AND Not Woman AND Not Woman AND Not Woman) = (1 – 0.59)4
= 0.0283
At a college 59% of the student body are women. Suppose that four students are
sampled at random. I will define the random variable X, to count how many women
are in the sample of four.
{X | 0, 1, 2, 3, 4}
Now let us get to the tougher issue of finding P(X = 1), P(X = 2) and P(X = 3). We saw
what it is we are after when we found P(X = 4) for instance, but because the number of
possibilities are going to increase to make the notation a bit cleaner instead of writing
Woman AND Woman AND Woman AND Woman I will write WWWW, and for
Not Woman AND Not Woman AND Not Woman AND Not Woman I will write NNNN.
So W indicates a person is a woman, and N indicates a person is not a woman.
At a college 59% of the student body are women. Suppose that four students are
sampled at random. I will define the random variable X, to count how many women
are in the sample of four.
{X | 0, 1, 2, 3, 4}
To see how we can get the rest of this probabilities, I will use a tree diagram.
P(X = 1) = P(WNNN OR NWNN OR NNWN OR NNNW)
So, P(W) = 0.59 P(N) = 0.41, we have independence,
the compound events are disjoint, thus
P(X = 1) = 0.59(0.41)3 + 0.59(0.41)3 + 0.59(0.41)3 + 0.59(0.41)3
= 4(0.59)(0.41)3
= 0.1627
I can follow a similar approach to find P(X = 2),
P(X = 3).
At a college 59% of the student body are women. Suppose that four students are
sampled at random. I will define the random variable X, to count how many women
are in the sample of four.
{X | 0, 1, 2, 3, 4}
See if you can find P(X = 2).
P(X = 2) =
P(WWNN OR WNWN OR WNNW OR NWNW OR
NNWW OR NWWN)
P(X = 2) = 6(0.59)2 (0.41)2
= 0.3511
At a college 59% of the student body are women. Suppose that four students are
sampled at random. I will define the random variable X, to count how many women
are in the sample of four.
{X | 0, 1, 2, 3, 4}
See if you can find P(X = 3). Use the tree as a guide.
P(X = 3) =
P(WWWN OR WWNW OR WNWW OR NWWW)
P(X = 3) = 4(0.59)3 (0.41)
= 0.3368
At a college 59% of the student body are women. Suppose that four students are
sampled at random. I will define the random variable X, to count how many women
are in the sample of four.
Thus the probability distribution of X is
X
0
1
2
3
4
P(X)
0.0283
0.1627
0.3511
0.3368
0.1212
Try this One
An over the counter AIDS test is 90% effective at providing a positive result if a person
has AIDS. A sample of three people with AIDS is randomly chosen and the AIDS test is
given to them. Let the random variable X count the number of positive results from
the sample of three. Again, notice that I am sampling three times from the population.
1. What is the sample space of X?
{0, 1, 2, 3}
2. Find the probability of each outcome.
P(X = 0 ) = 0.13
= 0.0010
P(X = 3 ) = 0.93
= 0.7290
An over the counter AIDS test is 90% effective at providing a positive result if a person
has AIDS. A sample of three people with AIDS is randomly chosen and the AIDS test is
given to them. Let the random variable X count the number of positive results from
the sample of three.
1. What is the sample space of X?
{0, 1, 2, 3}
2. Find the probability of each outcome.
P(X = 1 ) = 3(0.1)2(0.9)
= 0.0010
P(X = 2) =3(0.1)( 0.9)2
= 0.2430
An over the counter AIDS test is 90% effective at providing a positive result if a person
has AIDS. A sample of three people with AIDS is randomly chosen and the AIDS test is
given to them. Let the random variable X count the number of positive results from
the sample of three.
1. What is the sample space of X?
{0, 1, 2, 3}
2. Find the probability of each outcome.
P(X = 1 ) = 3(0.1)2(0.9)
= 0.0010
P(X = 2) =3(0.1)( 0.92)
= 0.2430
3. Write down the probability distribution of X.
X
P(X)
0
1
2
3
0.0010 0.0270 0.2430 0.7290
View these slides as often as you need.