Powerpoint - University of Windsor

Download Report

Transcript Powerpoint - University of Windsor

Basic Quantitative Methods in
the Social Sciences
(AKA Intro Stats)
02-250-01
Lecture 4
A Quick Review
• The entire area under the normal curve
can be considered to be a proportion of
1.00
• A proportion of .50 lies to the left of the
mean, and a proportion of .50 lies to
the right of mean
Area Under the Normal
Distribution and Z-Scores
Normal Distribution
with z-score points
of reference:
Properties of Area Under the
Normal Distribution
• Since the normal curve is a bell shape,
the proportion of scores between whole
z-scores is not equal
• For example, .3413 of the scores lie
between the z-scores of 0 (the mean)
and 1 (or -1), while only .1359 of the
scores lie between the z-scores of 1 and
2 (or -1 and -2)
Properties of Area Under the
Normal Distribution
.3413
.3413
.1359
.1359
.0215
.0215
.0013
.0013
Z=
-3
-2
-1
0
+1
+2
+3
Properties of Area Under the
Normal Distribution
Z-scores* Proportion under the curve
-1 to +1
.6826 (.3413+.3413)
-2 to +2
.9544
-3 to +3
.9974
-4 to +4
1.0000
*Z-scores are expressed in standard deviation units,
i.e., a z-score of -1 represents one standard deviation
below (to the left of) the mean
Normal Distribution Example
• A study of 2500 University of Windsor students
showed that the average amount of sleep lost in
the week prior to writing a statistics exam (in
hours) was normally distributed with  = 7.79
and  = 1.75 (don’t worry, this isn’t real data!)
• This distribution is shown with the abscissa (xaxis) marked in raw score and z-score units:
Normal Distribution Example
.3413
.3413
.1359
.1359
.0215
.0215
.0013
X=
Z=
Z=
.0013
2.54
-3
-3
4.29
-2
-2
6.04
-1
-1
7.79
0
0
9.54
+1
+1
11.29
+2
+2
13.04
+3
+3
Example cont.
• We can see from this diagram that 34.13% of
U of W students lost between 6.04 and 7.79
hours of sleep in the week prior to a stats
test (between z=-1 and z=0)
• 13.59% of students lost between 9.54 and
11.29 hours of sleep in that week (between
z=+1 and z=+2)
• 49.87% of students lost between 2.54 & 7.79
hours of sleep (between z=-3 and z=0)
(.0215+.1359+.3413 = .4987 = 49.87%)
Properties of Area Under the
Normal Distribution
• The symbol Z is used to denote the z-score having
area  (alpha) to its right under the normal curve
• The proportion of area under the curve between the
mean and a z-score can be found with the help of a
table (Table E.10, Howell, p. 452) and a little math…
• In this example, we want to know the area between
the mean and z = 0.20:
• Look under the column “mean to z” at z=0.20
• The proportion = 0.0793
• Therefore, .0793 (or almost 8%) is the proportion of
data scores between the mean and the score that has
a z score of 0.20
Example cont.
• This means that the area between the
mean and z = 0.20 has an area under
the curve of 0.0793:
.0793
.4207
Z:
0
0.20
Example cont.
• Since half of the normal distribution has an
area of .5000, we can determine the area
beyond z = .20 by subtracting the area from
the mean to z = .20 from .5000:
• Area beyond z=.20 = .5000 - .0793
• Area beyond z=.20 = .4207
• (Note: If you look at the “smaller portion” in
the table, you will see it’s .4207)
Example cont.
• Since the normal curve is symmetrical, the
area between the mean and z = -.20 is equal
to the area between the mean and z = +.20:
.0793
.0793
.4207
.4207
Z:
-0.20 0 +0.20
Normal Distribution Table
• Table E.10 has 3 columns:
Mean to z
Larger portion
Smaller portion
Table: Mean to z
Table: Larger Portion
Table: Smaller Portion
A Couple of Notes
• 1) Always report proportions (area under the curve) to
four decimal places. This means that if you report an
area as a percentage, it will have two decimal places
(e.g., .7943 = 79.43%)
• 2) When using Table E.10, be careful not to confuse
z=.20 with z=.02 (this is a common mistake)
• 3) Remember that a negative z value has the same
proportion under the curve as the positive z value
because the normal distribution is symmetrical
• 4) When working on z-score problems, it is highly
recommended that you draw a normal distribution and
plot the mean, x, and their corresponding z-scores
Another Example!
• We often want to know what the area between
two scores is, as in this example:
• Assume that the marks in this class are normally
distributed with  = 69.5 and  = 7.4. What
proportion of students have marks between 50
and 80?
Example: Area Between 2 Scores
1) Calculate the z-scores for X values (50 & 80)
z = (50-69.5)/7.4 = -19.5/7.4 = -2.64
z = (80-69.5)/7.4 = 10.5/7.4 = 1.42
2) Find the proportions between the mean and
both z-scores (consult Table E.10)
z(-2.64) = .4959 is the proportion between the
mean and z.
z(1.42) = .4222 is the proportion between the
mean and z.
Example: Area Between 2
Scores
• Third, add these proportions together to
find your answer:
.4959 + .4222 = .9181
• This means that 91.81% of students
have Stats marks between 50 and 80
Smaller and Larger Portions
• Smaller portion = proportion in the tail
• Larger portion = proportion in the body
• Using the same data (  = 69.5 and  = 7.4) we
can calculate areas using the Smaller and
Larger Portions in the Normal Distribution table:
• Find the number of students who have stats
marks of less than 80.6
• z = (80.6-69.5)/7.4 = +1.5
Larger Portion
• Area below z = +1.5 = 0.9332
This means that 93.32% of students had a
mark of 80.6 or less in this class
Smaller Portion
• Find the number of students who have
marks of 76.93 or better:
• z = (76.93-69.5)/7.4 = 1.00
• Area in smaller portion = .1587
• This means that 15.87% of students in
this class had a mark of 76.93 or better
Converting Back to X
• Assume  = 30 and  = 5, what raw scores
correspond to z=-1.00 and z=+1.5?
z
X 

Therefore
X    (z  )
X  30  ( 1.0  5)  25
X  30  (1.5  5)  37.5
Proportion
• What proportion of scores lie between
z=-1.00 and z=+1.50?
• Area from mean to z=-1.00 = .3413
• Area from mean to z=+1.50 = .4332
• Add them together to get the
proportion that lies between these two
z-scores: .3413+.4332 = .7745
Finding for Number of
Observations
• In this example, if we know the sample size,
(e.g., n=212) we can calculate how many
people lie between z=-1.00 and z=+1.50:
• Area between z=-1.00 and z=+1.50 = .7745
(see the last slide)
• Multiply the proportion by n:
(.7745)(212) = 164.19
Approximately 164 people
And a Little More
• Finally, we can find a z-score from the table if we
know the proportion of scores (i.e., we can work
backwards):
• Suppose the birth weight of newborns is
normally distributed with  = 7.73 and  = 0.83
• What birth weight identifies the top (heaviest)
10% of newborns?
Example cont.
• Look at Table E.10 and find the z-score
that identifies the top proportion of
0.1000: look in the smaller portion
column (the tail)
.1000
z=?
Example cont.
• Looking in the smaller portion column,
we find that
z=1.28 has an area of .1003
z=1.29 has an area of .0985
Which do we pick?
• Pick the one that is closest to an area of
.1000: this is z=1.28
Example cont.
• Now solve for X:
X  (z )( )  
X = (1.28)(0.83) + 7.73
= 1.06 + 7.73 = 8.79
So any weight equal to or greater than
8.79 pounds is in the top 10% of birth
weights
Probability
• Everything that can possibly happen
has some likelihood of happening:
probability is a measure of that
likelihood
• Probability: The quantitative expression
of likelihood of occurrence
Probability
• Probability is a ratio of frequencies
• The numerator (top) is the frequency of
the outcome of interest
• The denominator (bottom) is the
frequency of all possible outcomes
Coin Toss Example
• If a fair* coin is tossed in the air, it can
land on either heads or tails
• This means a coin has 2 possible
outcomes
• If we want to know the probability of
tossing a fair* coin and having it land
on heads, we calculate as follows:
*Note: fair means a normal coin, one that is
not weighted differently
Coin Toss
Frequency of interest
Frequency of all possible outcomes
For a coin toss, this is :
1
2
The probability of the coin landing on heads is:
p(heads) = ½, or p(heads) = .5
Another Example
• Suppose there are 90 students in a
class, 59 of them are women and 31
are men
• If one of the students is chosen at
random, the probability of choosing a
woman is:
p(woman) = 59/90
More Probability
• If the entire class was women (e.g.,
there were no male students), the
probability of choosing a woman would
be 90/90
• If the entire class was men, the
probability of choosing a woman would
be 0/90
More Probability
• As a numerical value, probabilities can
range from 0.00 to 1.00
• The numerator can range from a
minimum of 0 to a maximum equal to
the denominator
Express Yourself!
• Probability can be expressed as a
fraction, e.g., p(woman) = 59/90
• Or as a decimal fraction:
p(woman) = .6556
• Although not usually expressed as a
percentage (e.g., 65.56%), they often
are in popular media
Probability cont.
• Even if we do not know the actual
observed frequencies (e.g., the number
of women), probabilities can be
determined theoretically
• Without throwing a die, we can deduce
the probability of landing on a 5
Die Example cont.
• We know the die has 6 sides - 6
possible outcomes
• We are only interested in one side (the
5), so the probability of landing on a 5
is:
p(5) = 1/6 = 0.1667
Probability and the Normal Distribution
• The normal distribution can be thought of as a
probability distribution. Here’s how:
• We know (from Table E.10) the proportion of
scores that fall above or below a given z score
• If you were to randomly pick a score from a
sample of scores, what is the probability that
you would pick a score that has a
corresponding z score of .40 or greater?
Probability and the Normal
Distribution
• The proportion of scores above or
below a given z score is the same as
the probability of selecting a score
above or below the z score
e.g., the probability of selecting a score
from a normal distribution that has a z
score of .40 or greater is .3446 (the area in
the smaller portion of z = .40)
Example #1
• Suppose people’s scores on a personality test
are normally distributed with a mean of 50
and a population standard deviation of 10.
• If you were to pick a person completely at
random, what is the probability that you
would pick someone with a score on this
personality test that is higher than 60?
Example #1
• Step #1: Write down what you know
X  60
  50
  10
• Step #2: What do you want to find?
p( X  60)
• Step #3: Draw the normal distribution, write in the
mean, standard deviation, and the X and shade the
area you are looking for
Example #1, Step #3
X: 20
30
40
50
60
70
80
Example #1
• Step #4: Calculate z score(s)
z
X 

10
60  50
z
 1.00
z
10
10
• Step #5: Use Table E.10 to find the probability
of selecting a score in your shaded area
Here we want p( X  60) or p ( z
Look up the smaller portion of z=1.00
p( z  1.00)  .1587
 1.00)
Example #1
• Step #6: Interpret:
The probability of picking someone at
random who has a personality test score of
60 or greater is .1587
Example #2
• Length of time spent waiting in line to buy
tickets at the movies is normally distributed
with a mean of 12 minutes and a population
standard deviation of 3 minutes.
• If you go to see a movie, what is the
probability that you will wait in line to buy
tickets for between 7.5 and 15 minutes?
Example #2
• Step #1: Write down what you know
X 1  7.5
X 2  15   12
 3
• Step #2: What do you want to find?
p(7.5  X  15)
• Step #3: Draw the normal distribution, write in the
mean, standard deviation, and both X scores and
shade the area you are looking for
Example #2, Step #3
X: 3
6 7.5
9
12
15
18
21
Example #2
• Step #4: Calculate z score(s)
z X1
7.5  12  4.5


 1.50
3
3
zX2
z
X 

15  12 3

  1.00
3
3
• Step #5: Use Table E.10 to find the probability of selecting a
score in your shaded area
Here we want
p(7.5  X  15)
or
p( 1.50  z  1.00)
Look up the mean to z of z = 1.00 = .3413
Look up the mean to z of z = -1.50 = .4332
Example #2
• Add the two areas together! (Each represent
the mean to z, so adding them together gives
you the overall shaded area) =
.3413+.4332=.7745
p( 1.50  z  1.00)  .7745
Example #2
• Step #6: Interpret:
The probability of waiting in line to buy
tickets at the movie for between 7.5 and
15 minutes is .7745. (Note: This means
that you will wait in line for between 7.5
and 15 minutes 77.45% of the time).