Transcript day7

Probability (cont.)
Assigning Probabilities
• A probability is a value between 0 and
1 and is written either as a fraction or as
a proportion.
• For the complete set of distinct possible
outcomes of a random circumstance, the
total of the assigned probabilities must
equal 1.
Complementary Events
One event is the complement of another event if the two
events do not contain any of the same simple events and
together they cover the entire sample space.
Notation: AC represents the complement of A.
Note: P(A) + P(AC) = 1
Example:A Simple Lottery (cont)
A = player buying single ticket wins
AC = player does not win
P(A) = 1/1000 so P(AC) = 999/1000
Classical Approach
• A mathematical index of the relative
frequency of likelihood of the occurrence
of a specific event.
– Based on games of chance
– The specific conditions of the game are
known.
Estimating Probabilities from Observed
Categorical Data - Empirical Approach
Assuming data are representative, the
probability of a particular outcome is
estimated to be the relative frequency
(proportion) with which that outcome
was observed.
Mutually Exclusive Events
Two events are mutually exclusive if they do not
contain any of the same simple events (outcomes).
Example; A Simple Lottery
A = all three digits are the same.
B = the first and last digits are
different
The events A and B are mutually
exclusive.
Independent and Dependent Events
• Two events are independent of each other
if knowing that one will occur (or has
occurred) does not change the probability
that the other occurs.
• Two events are dependent if knowing that
one will occur (or has occurred) changes
the probability that the other occurs.
Example Independent Events
•Customers put business card in restaurant
glass bowl.
•Drawing held once a week for free lunch.
•You and Vanessa put a card in two consecutive
wks.
Event A = You win in week 1.
Event B = Vanessa wins in week 2
• Events A and B refer to to different random
circumstances and are independent.
Example: Dependent Events
Event A = Alicia is selected to answer Question 1.
Event B = Alicia is selected to answer Question 2.
Events A and B refer to different random circumstances,
but are A and B independent events?
• P(A) = 1/50.
• If event A occurs, her name is no longer in the bag; P(B) = 0.
• If event A does not occur, there are 49 names in the
bag (including Alicia’s name), so P(B) = 1/49.
Knowing whether A occurred changes P(B). Thus, the
events A and B are not independent.
Probability Calculations
• Some Useful Formulas to Keep in Mind (Or in
Hand)
– U = Union (or)
– ∩ = Intersection (and)
• General Formulas
Adding (“or)
P(A U B) = P(A) + P(B) – P(A ∩ B)
Non-mutually Exclusive of Overlapping Outcomes.
P(A U B) = P(A) + P(B)
Mutually Exclusive Outcomes
Probability Calculations (cont.)
• General Formulas
Multiplying (“and”/sequential events)
P(A ∩ B) = P(A)*(P(B|A)
Nonindependence; sampling without
replacement
P(A ∩ B) = P(A)*P(B)
Independence; sampling with replacement
Joint and Marginal Probabilities
•
These probabilities refer to the proportion of an event as a fraction of the
total.
P(30 to 64) = 62,689/103,870 = .60
P(30 to 64 ∩ married) = 43,308/103,870 = .42
Unions and intersections
• P{AB}  P{A} + P{B} because A and B
do overlap.
• P{AB} = P{A} + P{B} - P{AB}.
• AB is the intersection of A and B; it
includes everything that is in both A and B,
and is counted twice if we add P{A} and
P{B}.
P{AUB} = P{A} + P{B} - P{A∩B}.
P(18 to 29 U Married) = .21 +.57 - .07 = .71
Conditional Probability
• Consider two events A and B.
• What is the probability of A, given the
information that B occurred? P(A | B) = ?
• Example:
– What is the probability that a women is
married given that she is 18 - 29 years old?
Probability Problems
P(Married | 18-29) = 7842/ 22,512
Conditional probability and
independence
• If we know that one event has occurred it may change
our view of the probability of another event. Let
– A = {rain today}, B = {rain tomorrow}, C = {rain in 90 days time}
• It is likely that knowledge that A has occurred will change
your view of the probability that B will occur, but not of
the probability that C will occur.
• We write P(B|A) P(B), P(C|A) = P(C). P(B|A) denotes
the conditional probability of B, given A.
• We say that A and C are independent, but A and B are
not.
• Note that for independent events P(AC) = P(A)P(C).
Age and Marital Status
• P(M) = 59,920/103,870 = .57
• P(18 to 29) = 22,512/103,870 = .21
•
•
•
•
P(M to 29) = 7,842/103,870 = .07
P(M U 18 to 29) = .57 +.21 - .07 = .71
P(M|18 to 29) = 7,842/22,512 = .34
P(M|30 to 64) = 43,808/62,689 = .69
– Knowledge of the age changes P(M). Age and
Marital status are not independent.
Group Practice
Continuous variables
• A continuous random variable is one which can
(in theory) take any value in some range, for
example crop yield, maximum temperature,
height, weight, etc.
Probability distributions
• If we measure a random variable many times,
we can build up a distribution of the values it can
take.
• Imagine an underlying distribution of values
which we would get if it was possible to take
more and more measurements under the same
conditions.
• This gives the probability distribution for the
variable.
Continuous probability distributions
• Because continuous random variables can take
all values in a range, it is not possible to assign
probabilities to individual values.
• Instead we have a continuous curve, called a
probability density function, which allows us to
calculate the probability a value within any
interval.
• This probability is calculated as the area under
the curve between the values of interest. The
total area under the curve must equal 1.
Normal (Gaussian) distributions
• Normal (also known as Gaussian) distributions
are by far the most commonly used family of
continuous distributions.
• They are ‘bell-shaped’ –and are indexed by two
parameters:
– The mean m – the distribution is symmetric about this
value
– The standard deviation s – this determines the
spread of the distribution. Roughly 2/3 of the
distribution lies within 1 standard deviation of the
mean, and 95% within 2 standard deviations.
The probability of continuous
variables
• IQ test
– Mean = 100 and sd = 15
• What is the probability of randomly
selecting an individual with a test score of
130 or greater?
– P(X ≤ 95)?
– P(X ≥ 112)?
– P(X ≤ 95 or X ≥ 112)?
The probability of continuous
variables (cont.)
• What is the probability of randomly
selecting three people with a test score
greater than 112?
– Remember the multiplication rule for
independent events.
Introduction to Statistical
Inference
Chapter 11
Populations vs. Samples
• Population
– The complete set of individuals
• Characteristics are called parameters
• Sample
– A subset of the population
• Characteristics are called statistics.
– In most cases we cannot study all the
members of a population
Inferential Statistics
• Statistical Inference
– A series of procedures in which the data
obtained from samples are used to make
statements about some broader set of
circumstances.
Two different types of procedures
• Estimating population parameters
– Point estimation
• Using a sample statistic to estimate a population parameter
– Interval estimation
• Estimation of the amount of variability in a sample statistic
when many samples are repeatedly taken from a population.
• Hypothesis testing
– The comparison of sample results with a known or
hypothesized population parameter
These procedures share a
fundamental concept
• Sampling distribution
– A theoretical distribution of the possible
values of samples statistics if an infinite
number of same-sized samples were taken
from a population.
Example of the sampling
distribution of a discrete variable
p(x)
Binomial sampling distribution of an
unbiased coin tossed 10 times
0.3
0.25
0.2
0.15
0.1
0.05
0
0
1
2
3
4
5
6
7
8
Number of heads in 10 tosses
9
10
Continuous Distributions
• Interval or ratio level data
– Weight, height, achievement, etc.
• JellyBlubbers!!!
Histogram of the Jellyblubber
population
Repeated sampling of the
Jellyblubber population (n = 3)
Repeated sampling of the
Jellyblubber population (n = 5)
Repeated sampling of the
Jellyblubber population (n = 10)
Repeated sampling of the
Jellyblubber population (n = 40)
For more on this concept
• Visit
– http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html
Central Limit Theorem
• Proposition 1:
– The mean of the sampling
distribution will equal the
mean of the population.
mx  m
• Proposition 2:
– The sampling distribution of
means will be approximately
normal regardless of the
shape of the population.
• Proposition 3:
– The standard deviation
(standard error) equals the
standard deviation of the
population divided by the
square root of the sample
size. (see 11.5 in text)
sx 
s
N
Application of the sampling
distribution
• Sampling error
– The difference between the sample mean and the population
mean.
• Assumed to be due to random error.
• From the jellyblubber experience we know that a
sampling distribution of means will be randomly
distributed with
mx  m
sx 
s
N
Standard Error of the Mean and
Confidence Intervals
• We can estimate how
much variability there
is among potential
sample means by
calculating the
standard error of the
mean.
s.e.  s x 
s
N
Confidence Intervals
• With our Jellyblubbers
– One random sample (n = 3)
• Mean = 9
– Therefore;
• 68% CI = 9 + or – 1(3.54)
• 95% CI = 9 + or – 1.96(3.54)
• 99% CI = 9 + or – 2.58(3.54)
6.132
s.e.  s x 
 3.54
3
Confidence Intervals
• With our Jellyblubbers
– One random sample (n = 30)
• Mean = 8.90
– Therefore;
• 68% CI = 8.90 + or – 1(1.11)
• 95% CI = 8.90 + or – 1.96(1.11)
• 99% CI = 8.90 + or – 2.58(1.11)
6.132
s.e.  s x 
 1.11
30
Hypothesis Testing (see handout)
1.
2.
3.
4.
5.
6.
State the research question.
State the statistical hypothesis.
Set decision rule.
Calculate the test statistic.
Decide if result is significant.
Interpret result as it relates to your
research question.