Probability and Statistics

Download Report

Transcript Probability and Statistics

June 4, 2009
Dr. Lisa Green
 Main
goal: Understand the difference between
probability and statistics.
 Also
will see:
• Binomial Model
• Law of Large Numbers
• Monte Carlo Simulation
• Confidence Intervals
Probability
Model
Data
Statistics
Model: An idealized version of how the world works.
Data: Collected observations.
 Probability:
The model is known, and we use
this knowledge to describe what the data will
look like.
 Statistics:
The model is (partially) unknown,
and we use the data to make conclusions about
the model.
 There
are repeated trials, each of which has
only two outcomes. (Success or Failure)
 The
trials are independent of each other.
 The
number of trials (n) is known.
 The
probability of success on each trial (p) is
constant.
 Flip
a coin 10 times, count the number of heads
seen. n=10, p=0.50
 Test
100 newly manufactured widgets, count
the number that fail to work. n=100, p=?
 Give
a blood test to 35 volunteers, count the
number with high cholesterol. n=35, p=?
 Pick
a point at random
inside the unit square.
 If it is also inside the
arc of the unit circle,
count it as a success.
If not, count it as a
failure.
 What is the
probability of a
success?
1 unit
 We
know that the probability of success is π/4.
 If
we repeat this trial n times, we have a
binomial experiment.
 If
n=100, we expect between 71 and 86 of the
trials to end up successes. (95% of the time)
n
Lower bound
Upper bound
100
71
86
1000
760
810
10000
7774
7934
100000
78286
78794
1000000
784594
786202
10000000
7851438
7856526
7851438/10000000 * 4 = 3.1406 and 7856526/10000000 * 4 = 3.1426
This is the law of large numbers in action.
If we didn’t already know the value of pi, and we had a lot of time, we could use this
to estimate pi. Using random processes to estimate constant numbers is called
Monte Carlo Simulation.
A simulation of this is at
http://polymer.bu.edu/java/java/montepi/montepiapplet.html
 We
knew the model.
 We
knew the values of all constants.
 We
used that knowledge to make predictions
about what was going to happen.
 Ask
a randomly chosen person whether they
know anyone affected by layoffs at GM.
 If
the response is yes, count this as a success. If
not, count it as a failure.
 What
is the probability of a success?
 We
don’t know the probability of success. Let’s
call it p for now.
 If
we repeat the trial n times, and are careful about
which people we talk to, we have a binomial
experiment.
 If
we talk to 100 people, and 17 say they know
someone affected by layoffs at GM, then the value
of p is somewhere between 0.096 and 0.244 (95%
confidence).
n
Observed
successes
Lower Bound
Upper Bound
100
17
0.096
0.244
1000
170
0.147
0.193
10000
1700
0.163
0.177
100000
17000
0.168
0.172
1000000
170000
0.169
0.171
Note: There are obviously logistical difficulties in asking a million people a question.
Confidence intervals have confidence levels. The ones above are at the 95% confidence
level. Here is an applet that lets you explore what the confidence level means:
http://www.rossmanchance.com/applets/Confsim/Confsim.html
 We
knew the model, but not the value of all
constants.
 We
used observed data to tell us something
about the model (the unknown constant).
 Buffon’s
Needle
http://www.mste.uiuc.edu/reese/buffon/buffon.
html
 Reese’s
Pieces Applet
http://www.rossmanchance.com/applets/Reeses
/ReesesPieces.html
 CAUSEweb
http://www.causeweb.org/
n x
n x
P( x)    p (1  p)
 x
N=10, p=0.14
N=100, p=0.14