MM207 Statistics - Charles Whiffen's Math & Stats Website
Download
Report
Transcript MM207 Statistics - Charles Whiffen's Math & Stats Website
MM207
Statistics
Welcome to the Unit 7
Seminar
Prof. Charles Whiffen
Statistical Significance
A set of measurements or observations in a
statistical study is said to be statistically significant
if it is unlikely to have occurred by chance.
• A detective in Detroit finds that 25 of the 62 guns used in
crimes during the past week were sold by the same gun
shop. Statistically Significant?
• The team with the worst win-loss record in basketball wins
one game against the defending league champions.
Statistically Significant?
Quantifying Statistical Significance
In general, we determine statistical
significance by using probability to quantify the
likelihood that a result may have occurred by
chance. We therefore ask a question like this
one:
Is the probability that the observed difference
occurred by chance less than or equal to 0.05 (or
1 in 20)?
Quantifying Statistical Significance
If the answer is yes (the probability is less than or
equal to 0.05), then we say that the difference is
statistically significant at the 0.05 level.
If the answer is no, the observed difference is
reasonably likely to have occurred by chance, so we
say that it is not statistically significant.
The choice of 0.05 is somewhat arbitrary, but it’s a
figure that statisticians frequently use. Other
probabilities, such as 0.1 or 0.01, are also used,.
EXAMPLE 2 Polio Vaccine Significance
In the test of the Salk polio vaccine, 33 of the
200,000 children in the treatment group got
paralytic polio, while 115 of the 200,000 in the
control group got paralytic polio. Calculations
show that the probability of this difference
between the groups occurring by chance is much
less than 0.01. Describe the implications of this
result.
Copyright © 2009 Pearson Education, Inc.
Slide 6.1- 5
Three Approaches to Finding Probability
• A theoretical probability is based on assuming that all
outcomes are equally likely. It is determined by dividing
the number of ways an event can occur by the total
number of possible outcomes.
• A relative frequency probability is based on observations
or experiments. It is the relative frequency of the event of
interest.
• A subjective probability is an estimate based on
experience or intuition.
Theoretical Probability
• Experiment: Rolling a single die
• Sample Space: All possible outcomes from experiment
S = {1, 2, 3, 4, 5, 6}
• Outcomes are the most basic possible results of observations or
experiments
• Event: a collection of one or more outcomes (denoted by capital
letter)
Event A = {3}
Event B = {even number}
• Probability = (number of favorable outcomes) / (total number of
outcomes)
• P(A) = 1/6
• P(B) = 3/6 = ½
Counting Possible Outcomes
Suppose process A has a possible outcomes and process B
has b possible outcomes. Assuming the outcomes of the
processes do not affect each other, the number of different
outcomes for the two processes combined is:
a×b
This idea extends to any number of processes.
If a third process C has c possible outcomes, the number of
possible outcomes for the three processes combined is:
a × b × c.
Applying the Counting Rule
How many outcomes are there if you roll a fair die and toss
a fair coin?
The first process, rolling a fair die, has six outcomes (1, 2,
3, 4, 5, 6).
The second process, tossing a fair coin, has two outcomes
(H, T).
Therefore, there are 6 × 2 = 12 outcomes for the two
processes together (1H, 1T, 2H, 2T, . . . , 6H, 6T).
Relative Frequency (Empirical)
Probability
Repeat or observe a process many times and count the
number of times the event of interest, A, occurs.
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝐴 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑑
Estimate P(A) by P(A) =P(A) =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
Geological records indicate that a river has crested above a
particular high flood level four times in the past 2,000 years.
What is the relative frequency probability that the river will
crest above the high flood level next year?
P(Flood) =
4
2000
=
1
=
500
0.002
Probability of an Event Not Occurring
If the probability of an event A is P(A), then the probability
that event A does not occur is P(not A).
Because the event must either occur or not occur, we can
write:
P(A) + P(not A) = 1 or P(not A) = 1 – P(A)
The event not A is called the complement of the event A;
the “not” is often designated by a bar, so Ā means not A.
Probability Distributions
A probability distribution represents the probabilities of all
possible events. Do the following to make a display of a
probability distribution:
1. List all possible outcomes. Use a table or figure if it is
helpful.
2. Identify outcomes that represent the same event. Find the
probability of each event.
3. Make a table in which one column lists each event and
another column lists each probability. The sum of all the
probabilities must be 1.
Creating a Probability Distribution
What is the probability distribution for the number of heads
that occurs when three coins are tossed simultaneously?
The number of different outcomes when three coins are
tossed is 2 × 2 × 2 = 8.
3 Coin Probability Distribution
Law of Large Numbers
• The law of large numbers (or law of averages) applies
to a process for which the probability of an event A is
P(A) and the results of repeated trials do not depend on
results of earlier trials (they are independent).
• It states: If the process is repeated through many trials,
the proportion of the trials in which event A occurs will be
close to the probability P(A). The larger the number of
trials, the closer the proportion should be to P(A).
Roulette Example
• A roulette wheel has 38 numbers: 18 black numbers, 18 red numbers,
and the numbers 0 and 00 in green.
• What is the probability of getting a red number on any spin?
The theoretical probability of getting a red number on any spin is:
P(Red) =
18
38
= 0.474
• The law of large numbers tells us that as the game is played more
and more times, the proportion of times that a red number appears
should get closer to 0.474. In 100,000 tries, the wheel should come
up red close to 47.4% of the time, or about 47,400 times.
Expected Value
• The expected value of a variable is the weighted average of all its
possible events. Because it is an average, we should expect to find the
“expected value” only when there are a large number of events, so that
the law of large numbers comes into play.
Consider two events, each with its own value and probability. The expected value is:
expected value = (value of event 1) * (probability of event 1)
+ (value of event 2) * (probability of event 2)
This formula can be extended to any number of events by including more terms in the
sum.
Winning the Lottery
A $1 lottery tickets have the following probabilities: 1 in 5 to win
a free ticket (worth $1); 1 in 100 to win $5; 1 in 100,000 to win
$1,000; and 1 in 10 million to win $1 million.
What is the expected value of a lottery ticket?
Event
Purchase ticket
Win free ticket
Win $5
Win $1,000
Win
$1,000,000
Value
-$1
$1
$5
$1,000
$1,000,000
Probability
1
1/5
1/100
1/100,000
1/10,000,000
Value * Probability
-$1 x 1
$1 x 1/5
$5 x $100
$1,000 x 1/100,000
$1,000,000/10,000,000
Result
-$1.00
$0.20
$0.05
$0.01
$.10
Expected Value =
$-0.64
Thus, averaged over many tickets, you should expect to lose 64¢ for each
lottery ticket that you buy. If you buy, say, 1,000 tickets, you should expect
to lose about 1,000 × $0.64 = $640.
Gambler’s Fallacy
The gambler’s fallacy is the mistaken belief that a streak of
bad luck makes a person “due” for a streak of good luck.
• Assume you are playing coin flipping game and you win
$1 each time the coin lands Heads. You have just lost 7
flips in a row. You might think that, given the run of heads,
a tail is “due” on the next toss.
• But the probability of a head or a tail on the next toss is
still 0.50; the coin has no memory of previous tosses.
Accident Rates
Travel risk is often expressed in terms of an accident rate
or death rate.
For example, suppose an annual accident rate is 750
accidents per 100,000 people.
• This means that, within a group of 100,000 people, on
average 750 will have an accident over the period of a year.
•
The statement is in essence an expected value, which means
it also represents a probability: It tells us that the probability of
a person being involved in an accident (in one year) is 750 in
100,000, or 0.0075.
Death Rates
Over the past 20 years in the United States, the average (mean) number
of deaths in commercial airplane accidents has been roughly 100 per
year.
Currently, airplane passengers in the United States travel a total of about
8 billion miles per year. Use these numbers to calculate the death rate
per mile of air travel.
• Assuming 100 deaths and 8 billion miles in an average year, the risk of air travel
is:
𝟏𝟎𝟎 𝒅𝒆𝒂𝒕𝒉𝒔
𝟖 𝒙 𝟏𝟎𝟗 𝒎𝒊𝒍𝒆𝒔
= 1.3 x 10-8 miles
• The death rate is 1.3 deaths per 100 million miles
Vital Statistics
Data concerning births and deaths of citizens, often
called vital statistics.
Assuming a U.S. population of 300 million and 554,643
deaths, find the risk per person and per 100,000 people of
dying from cancer.
• The risk per person is:
554,643
300,000,000
= 0.0018.
• The risk per 100,000 is: .0018 x 100,000 = 180, this is the
death rate per 100,000 people
Life Expectancy
Life expectancy is the number of years a person with
a given age today can expect to live on average.
• As we would expect, life expectancy is higher for younger
people because, on average, they have longer left to live.
• At birth, the life expectancy of Americans today is about 78
years (75 years for men and 81 years for women)
Using StatCrunch
What About StatCrunch?
Many of the calculations of simple
probabilities are best done with just a
calculator. However, StatCrunch can
be a great help with contingency
tables used to calculate compound
probabilities. More information on
contingency tables can be found on
pp. 418-419 (Chapter 10) of the text.
Example – Appendix A Data Set 18,
Homes Sold in Dutchess County
Stat>Tables>Contingency>with data
Contingency Table with data
Contingency Table with data
• What this tells us (for
example):
• P(3BR)=19/40
• P(2Bath)=21/40
• P(1or2Bath)=33/40
• P(2Bath & 3BR)=8/40
• P(2Bath|3BR)=8/19
• P(3BR|2Bath)=8/21
Contingency Table with data
•
•
•
•
•
•
•
Now you try it!
P(4BR)=
P(3Bath)=
P(3or4Bath)=
P(3Bath & 2BR)=
P(3Bath|2BR)=
P(2BR|3Bath)=
QUESTIONS?
Review of Unit 7 Work
By Tuesday at Midnight
you must complete:
• Initial post to one
discussion question
• Two responses to
other student posts
to discussion
questions
• Live Binder
• MSL HW
• MSL Quiz
32