Probability and Statistics
Download
Report
Transcript Probability and Statistics
June 4, 2009
Dr. Lisa Green
Main
goal: Understand the difference between
probability and statistics.
Also
will see:
• Binomial Model
• Law of Large Numbers
• Monte Carlo Simulation
• Confidence Intervals
Probability
Model
Data
Statistics
Model: An idealized version of how the world works.
Data: Collected observations.
Probability:
The model is known, and we use
this knowledge to describe what the data will
look like.
Statistics:
The model is (partially) unknown,
and we use the data to make conclusions about
the model.
There
are repeated trials, each of which has
only two outcomes. (Success or Failure)
The
trials are independent of each other.
The
number of trials (n) is known.
The
probability of success on each trial (p) is
constant.
Flip
a coin 10 times, count the number of heads
seen. n=10, p=0.50
Test
100 newly manufactured widgets, count
the number that fail to work. n=100, p=?
Give
a blood test to 35 volunteers, count the
number with high cholesterol. n=35, p=?
Pick
a point at random
inside the unit square.
If it is also inside the
arc of the unit circle,
count it as a success.
If not, count it as a
failure.
What is the
probability of a
success?
1 unit
We
know that the probability of success is π/4.
If
we repeat this trial n times, we have a
binomial experiment.
If
n=100, we expect between 71 and 86 of the
trials to end up successes. (95% of the time)
n
Lower bound
Upper bound
100
71
86
1000
760
810
10000
7774
7934
100000
78286
78794
1000000
784594
786202
10000000
7851438
7856526
7851438/10000000 * 4 = 3.1406 and 7856526/10000000 * 4 = 3.1426
This is the law of large numbers in action.
If we didn’t already know the value of pi, and we had a lot of time, we could use this
to estimate pi. Using random processes to estimate constant numbers is called
Monte Carlo Simulation.
A simulation of this is at
http://polymer.bu.edu/java/java/montepi/montepiapplet.html
We
knew the model.
We
knew the values of all constants.
We
used that knowledge to make predictions
about what was going to happen.
Ask
a randomly chosen person whether they
know anyone affected by layoffs at GM.
If
the response is yes, count this as a success. If
not, count it as a failure.
What
is the probability of a success?
We
don’t know the probability of success. Let’s
call it p for now.
If
we repeat the trial n times, and are careful about
which people we talk to, we have a binomial
experiment.
If
we talk to 100 people, and 17 say they know
someone affected by layoffs at GM, then the value
of p is somewhere between 0.096 and 0.244 (95%
confidence).
n
Observed
successes
Lower Bound
Upper Bound
100
17
0.096
0.244
1000
170
0.147
0.193
10000
1700
0.163
0.177
100000
17000
0.168
0.172
1000000
170000
0.169
0.171
Note: There are obviously logistical difficulties in asking a million people a question.
Confidence intervals have confidence levels. The ones above are at the 95% confidence
level. Here is an applet that lets you explore what the confidence level means:
http://www.rossmanchance.com/applets/Confsim/Confsim.html
We
knew the model, but not the value of all
constants.
We
used observed data to tell us something
about the model (the unknown constant).
Buffon’s
Needle
http://www.mste.uiuc.edu/reese/buffon/buffon.
html
Reese’s
Pieces Applet
http://www.rossmanchance.com/applets/Reeses
/ReesesPieces.html
CAUSEweb
http://www.causeweb.org/
n x
n x
P( x) p (1 p)
x
N=10, p=0.14
N=100, p=0.14