Introduction to Statistics - University College Dublin

Download Report

Transcript Introduction to Statistics - University College Dublin

Introduction
to
Statistics
Dr. P Murphy
Why study
Statistics?
We like to think that we have control
over our lives.
But in reality there are many things
that are outside our control.
Everyday we are confronted by our own
ignorance.
According to Albert Einstein:
“God does not play dice.”
But we all should know better than
Prof. Einstein.
The world is governed by Quantum
Mechanics where Probability reigns
supreme.
Consider a day in the
life of an average
UCD student.
You wake up in the morning and the
sunlight hits your eyes. Then suddenly
without warning the world becomes an
uncertain place.
How long will you have to wait for
the Number 10 Bus this morning?
When it arrives will it be full?
Will it be out of service?
Will it be raining while you wait?
Will you be late for your 9am Maths
lecture?
Probability
is the
Science of Uncertainty.
 It is used by Physicists to predict the
behaviour of elementary particles.
It is used by engineers to build
computers.
It is used by economists to predict the
behaviour of the economy.
It is used by stockbrokers to make
money on the stockmarket.
It is used by psychologists to
determine if you should get that job.
What about
Statistics?
 Statistics is the Science of Data.
The Statistics you have seen before
has been probably been Descriptive
Statistics.
And Descriptive Statistics made you
feel like this ….
What is
Inferential
Statistics?
 It is a discipline that allows us to
estimate unknown quantities by
making some elementary
measurements.
Using these estimates we can then
make Predictions and Forecast the
Future
Chapter 1
Probability
Consider
a Real Problem
Can you make money playing the
Lottery?
Let us calculate chances of winning.
To do this we need to learn some basic
rules about probability.
These rules are mainly just ways of
formalising basic common sense .
Example: What are the chances that
you get a HEAD when you toss a coin?
Example: What are the chances you
get a combined total of 7 when you roll
two dice?
1.1 Experiments
An Experiment leads to a single
outcome which cannot be predicted
with certainty.
ExamplesToss a coin: head or tail
Roll a die:
1, 2, 3, 4, 5, 6
Take medicine: worse, same, better
Set of all outcomes - Sample Space.
Toss a coin Sample space = {h,t}
Roll a die Sample space = {1, 2, 3, 4, 5, 6}
1.2 Probability
The Probability of an outcome is a
number between 0 and 1 that
measures the likelihood that the
outcome will occur when the
experiment is performed.
(0=impossible, 1=certain).
Probabilities of all sample points
must sum to 1.
Long run relative frequency
interpretation.
EXAMPLE: Coin tossing experiment
P(H)=0.5 P(T)=0.5
1.3 Events
An event is a specific collection of
sample points.
The probability of an event A is
calculated by summing the
probabilities of the outcomes in the
sample space for A.
1.4 Steps for
calculating
Probailities
Define the experiment.
List the sample points.
Assign probabilities to the sample
points.
Determine the collection of sample
points contained in the event of interest.
Sum the sample point probabilities to
get the event probability.
Example:
THE GAME Of CRAPS
In Craps one rolls two fair dice.
What is the probability of the
sum of the two dice showing 7?
(1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
(2,1) (2,2) (2,3) (2,4) (2,5) (2,6)
(3,1) (3,2) (3,3) (3,4) (3,5) (3,6)
(4,1) (4,2) (4,3) (4,4) (4,5) (4,6)
(5,1) (5,2) (5,3) (5,4) (5,5) (5,6)
(6,1) (6,2) (6,3) (6,4) (6,5) (6,6)
(1,6)
(2,5)
(3,4)
(4,3)
(5,2)
(6,1)
1.5 Equally likely
outcomes
So the Probability of 7 when
rolling two dice is 1/6
This example illustrates the
following rule:
In a Sample Space S of equally
likely outcomes. The probability
of the event A is given by
P(A) = #A / #S
That is the number of outcomes
in A divided by the total number
of events in S.
1.6 Sets
A compound event is a composition of
two or more other events.
AC: The Complement of A is the event
that A does not occur
AB : The Union of two events A and
B is the event that occurs if either A or
B or both occur, it consists of all sample
points that belong to A or B or both.
AB: The Intersection of two events
A and B is the event that occurs if both
A and B occur, it consists of all sample
points that belong to both A and B
1.7 Basic
Probability Rules
P(Ac)=1-P(A)
P(AB)=P(A)+P(B)-P(AB)
Mutually Exclusive Events are
events which cannot occur at
the same time.
P(AB)=0 for Mutually
Exclusive Events.
1.8 Conditional
Probability
P(A | B) ~ Probability of A
occuring given that B has
occurred.
P(A | B) = P(AB) / P(B)
Multiplicative Rule:
P(AB)
= P(A|B)P(B)
= P(B|A)P(A)
1.9 Independent
Events
A and B are independent events
if the occurrence of one event
does not affect the probability
of the othe event.
If A and B are independent then
P(A|B)=P(A)
P(B|A)=P(B)
P(AB)=P(A)P(B)
Chapter 1
Probability
EXAMPLES
Probability as
a matter of
life and death
Positive Test for Disease
1 in every 10000 people in Ireland
suffer from AIDS
There is a test for HIV/AIDS
which is 95% accurate.
You are not feeling well and you
go to hospital where your
Physician tests you.
He says you are positive for AIDS
and tells you that you have 18
months to live.
How should you react?
Positive Test for Disease
Let D be the event that you have
AIDS
 Let T be the event that you test
positive for AIDS
 P(D)=0.0001
 P(T|D)=0.95
 P(D|T)=?

Positive Test for Disease
P( D  T )
P( D | T ) 
P (T )
P (T | D ) P ( D )

C
P ({T  D}  {T  D})
P (T | D ) P ( D )

P (T  D )  P (T  D C )
P (T | D ) P ( D )

C
C
P (T | D ) P ( D )  P (T | D ) P ( D )
(0.95)(0.0001)

(0.95)(0.0001)  (0.05)(0.9999)
 0.001897
Chapter 1
Examples
Example 1.1
S={A,B,C}
P(A) = ½
P(B) = 1/3
P(C) = 1/6
What is P({A,B})?
What is P({A,B,C})?
List all events Q such that
P(Q) = ½.
Chapter 1
Examples
Example 1.2
Suppose that a lecturer arrives
late to class 10% of the time,
leaves early 20% of the time
and both arrives late AND
leaves early 5% of the time.
On a given day what is the
probability that on a given day
that lecturer will either arrive
late or leave early?
Chapter 1
Examples
Example 1.3
Suppose you are dealt 5 cards
from a deck of 52 playing cards.
Find the probability of the
following events
1. All four aces and the king of
spades
2. All 5 cards are spades
3. All 5 cards are different
4. A Full House (3 same, 2
same)
Chapter 1
Examples
Example 1.4
The Birthday Problem
Suppose there are N people in a
room.
How large should N be so that
there is a more than 50% chance
that at least two people in the
room have the same birthday?
Number in Room Prob at least 2 have same birthday
1
0.00
2
0.00
3
0.01
4
0.02
5
0.03
6
0.04
7
0.06
8
0.07
9
0.09
10
0.12
11
0.14
12
0.17
13
0.19
14
0.22
15
0.25
16
0.28
17
0.32
18
0.35
19
0.38
20
0.41
21
0.44
22
0.48
23
0.51
24
0.54
25
0.57
26
0.60
27
0.63
28
0.65
29
0.68
30
0.71
31
0.73
32
0.75
33
0.77
34
0.80
35
0.81
36
0.83
37
0.85
38
0.86
39
0.88
40
0.89
41
0.90
42
0.91
43
0.92
44
0.93
45
0.94
46
0.95
47
0.95
48
0.96
49
0.97
50
0.97
51
0.97
52
0.98
53
0.98
54
0.98
55
0.99
56
0.99
57
0.99
Chapter 1
Examples
Example 1.4
Children are born equally likely
as Boys or Girls
My brother has two children
(not twins)
One of his children is a boy
named Luke
What is the probability that his
other child is a girl?
Example 1.5
The Monty Hall Problem
Game Show
 3 doors
 1 Car & 2 Goats
 You pick a door - e.g. #1
 Host knows what’s behind all
the doors and he opens another
door, say #3, and shows you a
goat
 He then asks if you want to
stick with your original choice
#1, or change to door #2?

Ask Marilyn.
Parade Magazine Sept 9 1990
Marilyn vos Savant
 Guinness Book of Records Highest IQ
 “Yes you should switch. The
first door has a 1/3 chance of
winning while the second has a
2/3 chance of winning.”
 Ph.D.s - Now two doors, 1 goat
& 1 car so chances of winning
are 1/2 for door #1 and 1/2 for
door #2.
 “You are the goat” - Western
State University.

Who’s right?

At the start, the sample space is:
 {CGG,
GCG, GGC}

Pick a door e.g. #1
1 in 3 chance of winning

Host shows you a goat so now

 {CGG,

GCG, GGC}
So Marilyn was right, you should
switch.
Not convinced?
Imagine a game with 100 doors.
 1 F430 Ferrari, 99 Goats.
 You pick a door.
 Host opens 98 of the 99 other
doors.
 Do you stick with your original
choice? Prob = 1/100
 Or move to the unopened door.
Prob = 99/100

Boys, Girls
and Monty Hall
Sample Space ( listing oldest child
first)
 {GG, BG, GB, BB}
 Equally likely events

One child is a boy:
 GG is impossible
 {BG, GB, BB} =>
 P(OC = G) = 2/3

Luke is 6 months old.
 {GB, BB} => P(OC = G) = 1/2

Odd Socks
It is winter and the ESB are on
strike. This morning when you
woke up it was dark. In your sock
drawer there was one pair of two
black socks and one odd brown
one.
Are you more or less likely to be
wearing matching socks today?
EXAMS
Campus
Belfield
Female
Male
Pass Rate Pass Rate
40%
33%
ET/
75%
Carysfort
etc.
71%
Seeing this evidence amale
student takes UCD to court
saying there is discimination
against male students. UCD
gathers all it’s exam information
together and reports the
following.
EXAM
Pass Rates
Overall Female pass rate is 56%
Overall Male pass rate is 60%
HOW CAN THIS BE?
Clearly UCD are LYING !
Campus
Belfield
Female
Male
Pass Rate Pass Rate
40%
33%
ET/
75%
Carysfort
etc.
71%
SimpSon’S
Paradox
Overall Female pass rate is 56%
Overall Male pass rate is 60%
Campus
Female
Pass Rate
Belfield 40%
= 20/50
ET/
30/40
Carysfort =75%
etc.
50/90
= 56%
Male
Pass Rate
33%
= 10/30
50/70
= 71%
60/100
=60%
Hit and RUN
Once upon a time in Hicksville,
USA there was a night-time hit and
run accident involving a taxi. There
are two taxi companies in
Hicksville, Green and Blue. 85%
of taxis are Green and 15% are
Blue. A witness identified the taxi
as being Blue. In the subsequent
court case the judge ordered that
the witness’s observation under the
conditions that prevailed that night
be tested. The witness correctly
identified each colour of taxi 80%
of the time.
Hit and RUN
What is the probability that it was
indeed a blue taxi that was involved
in the accident?
DNA
You are holiday in Belfast and an
explosion destroys the Odessey
arena.
You are seen running from the
explosion and are arrested.
You are subsequently charged with
being a member of a prescribed
paramilitary organisation and with
causing the explosion.
In court you protest your innocence.
However the PSNI have DNA
evidence they claim links you to the
crime.
DNA
Their forensic scientist delivers the
following vital evidence.
The forensic scientist indicates that
DNA found on the bomb matches
your DNA.
Your lawyer at first disputes this
evidence and hires an independent
scientist.
However the second forensic
scientist also says that the DNA
matches yours and that there is a 1 in
500 million probability of the
match.
DNA
What do you do?
It appears as if you are going to
spend the rest of your days in jail.
The National
Lottery
“I lied, cheated and stole to
become a millionaire. Now
anybody at all can win the
lottery and become a
millionaire”
GAME #1: LOTTO 6/42

What are the chance of winning
with one selection of 6
numbers?
Matches Chances of Winning
6
1 in 5,245,786
5
1 in 24,286
4
1 in 555
GAME #1: LOTTO 6/42
Expected Winnings
Only consider Jackpot
1 Euro get 1 play
E(win)= Jackpot*(1/5,245,786) –
1Euro*(5,245,785/5,245,786)
E(win)=
Jackpot*0.0000001910.999999809
If only one jackpot winner then:
Positive E(win) if
Jackpot >5,245,785
LOTTO 6/42

The average time to win each of the prizes is
given by:

Match 3 with Bonus
2 Years, 6 Weeks

Match 4
2 Years, 8 Months

Match 5
116 Years, 9 Months

Match 5 with Bonus 4323 Years, 5 Months

Share in Jackpot
25,220 Years
Tossing a fair coin
Tossing a coin!


You are joking!
That is boring … no question about it!

1957 Second edition of William Feller’s
Textbook includes a chapter on cointossing.

Introduction: “The results concerning
…coin-tossing show that widely held
beliefs … are fallacious. These results
are so amazing and so at variance with
common intuition that even
sophisticated colleagues doubted that
coins actually misbehave as theory
predicts.”
Tossing a coin!

Toss a coin 2N times.
Law of Averages:
 As N increases the chances that
there are equal numbers of heads
and tails among the 2N tosses
increases.
 Lim N-> P( #H = #T ) = 1.
 In the limit as N tends to infinity
the probability of matching
numbers of heads and tails
approaches 1.

Rosencrantz
and
Guildenstern
are Dead
Prob of equal
numbers of H and T
# of
2
tosses
½
Prob
4
6
8
3/8
5/16
35/128 63/256
0.5 0.375
0.3125 0.273
10
0.246