Stats Review
Download
Report
Transcript Stats Review
Ways to look at the data
Histogram
Box plot
Dot plot
Ch04_Hurricanes
0
1
2
3 4 5 6
Hurricanes
Dot Plot
7
8
Number of hurricanes that occurred each year from 1944
through 2000 as reported by Science magazine
3 Characteristics of data
Shape
Center
Spread
Shape of the data – Symmetric
The age of all US Presidents at the time they took office
Notice that this distribution has only one mode
Shape of the data – Bimodal
The winning times in the Kentucky Derby from 1875 to the
present. Why two modes?
Shape of the data – Bimodal
The winning times in the Kentucky Derby from 1875 to the
present. Why two modes?
The length of the track was reduced from 1.5 miles to
1.25 miles in 1896. The race officials thought that 1.5 miles
was too far.
Shape of the data – skewed
LEFT
RIGHT
Data for two different variables for all female heart attack
patients in New York state in one year. One is skewed
left; the other is skewed right. Which is which?
Center and Spread of Data
Maximum
100th percentile
Q3
75th percentile
Median
50th percentile
Q1
25th percentile
Minimum
0th percentile
These numbers are called the 5 number summary.
The median measures the center of the data.
Q3 – Q1 = Interquartile range (IQR) measures the spread.
x
x
Symbols
Symbols:
• s2 = Sample Variance
• s = Sample Standard Deviation
• 2 = Population Variance
(Pop. St. Dev. Squared)
x
• = Population Standard Deviation (Sq. Root of
Variance)
--
• REMEMBER-The Variance is the SD squared!
And the SD is the Sq. root of the Variance!
•
x = Mean
The normal distribution and standard
deviations
34%
2.35%
34%
13.5%
In a normal distribution:
The total area under the curve is 1.
13.5%
2.35%
The normal distribution and standard
deviations
In a normal distribution:
Approximately 68% of scores will fall within one
standard deviation of the mean
The normal distribution and standard
deviations
In a normal distribution:
Approximately 95% of scores will fall within two
standard deviations of the mean
The number of points that one standard deviations equals
varies from distribution to distribution. On one math test, a
standard deviation may be 7 points. If the mean were 45, then
we would know that 68% of the students scored from 38 to 52.
2.35%
24
31
13.5%
38
On another test, a
standard deviation may
equal 5 points. If the mean
were 45, then 68% of the
students would score from
40 to 50 points.
34%
34%
13.5%
2.35%
45
52
59
Points on Math Test
2.35%
30
35
13.5%
63
34%
34%
13.5% 2.35%
40
45
50
55
Points on a Different Test
60
Using standard deviation units to
describe individual scores
Here is a distribution with a mean of 100 and standard deviation of 10:
80
-2 sd
90
-1 sd
100
What score is one sd below the mean?
What score is two sd above the mean?
110
1 sd
90
120
120
2 sd
Using standard deviation units to
describe individual scores
Here is a distribution with a mean of 100 and standard deviation of 10:
80
-2 sd
90
-1 sd
100
110
1 sd
120
2 sd
How many standard deviations below the mean is a score of 90?
1
How many standard deviations above the mean is a score of 120?
2
Using standard deviation units to
describe individual scores
Here is a distribution with a mean of 100 and standard deviation of 10:
80
-2 sd
90
-1 sd
100
110
1 sd
120
2 sd
What percent of your data points are < 80?
2.50%
What percent of your data points are > 90?
84%
Types of Sampling:
Self-selected Sample
• This methods allows the sample to choose
themselves by responding to a general
appeal (volunteering to be surveyed).
• Examples of Self-selected Sample: a callin radio poll, an internet poll on a website
• Problems with Self-selected samples: bias
– because people with strong opinions on
the topic (especially negative opinions) are
most likely to respond.
Convenience Sampling
• In a convenience sample individuals are
chosen because they are easy to reach.
• Example: People conducting a survey go
to the mall and stop people who are
shopping. This is convenient for the
person doing the survey but does not
guarantee that the sample is
representative of the population of the
study.
• Convenience sampling also involves bias
on the part of the interviewer.
Random Samples
• A random sample of size “n” individuals
from the population chosen in such a way
that every set of “n” individuals has an
equal chance to be the sample selected.
• Example: Putting everyone’s name in a
hat and drawing 3 names to participate in
the study.
Systematic Sample
• When a rule is used to select members of
the population.
• Ex. Every third person on an alphabetized
list
Stratified Random Sample
To select a stratified random sample, first
divide the population into groups of similar
individuals, called STRATA. Then choose
a separate sample in each strata and
combine these to form the full sample.
Common example would be separating by
gender or race first, then selecting
samples from each group.