Unit 1 - AP Statistics - Lang
Download
Report
Transcript Unit 1 - AP Statistics - Lang
Unit 1
Mr. Lang’s AP Statistics Power point
Homework Assignment
For the A: 1, 3, 5, 7, 8,11- 25 Odd, 27 – 32,
37 – 59 Odd, 60, 69 – 74, 79 – 105 Odd
(except 85, 99, 101) 107 – 110, R1-R10
For the C: 1, 3, 5, 8, 11- 25 Odd, 37 – 59
Odd, 79 – 103 Odd (except 85, 99, 101) R1R10
For the D- : 1, 3, 5, 11, 15, 19, 23, 37, 41,
45, 49, 79, 83, 87, 91, 97, 103, R1- R10
Statistics
the science of
collecting, analyzing,
and drawing
conclusions from data
Descriptive statistics
the methods of
organizing &
summarizing data
Inferential statistics
involves making
generalizations from a
sample to a population
Population
The entire collection
of individuals or
objects about which
information is desired
Sample
A subset of the
population, selected for
study in some
prescribed manner
Variable
any characteristic
whose value may
change from one
individual to another
Data
observations on single
variable or
simultaneously on two
or more variables
Types of variables
Categorical variables
or qualitative
identifies basic
differentiating characteristics
of the population
Numerical variables
or quantitative
observations or measurements
take on numerical values
makes sense to average these
values
two types - discrete & continuous
Discrete (numerical)
listable set of values
usually counts of items
Continuous (numerical)
data can take on any values
in the domain of the variable
usually measurements of
something
Classification by the
number of variables
Univariate - data that describes a
single characteristic of the population
Bivariate - data that describes two
characteristics of the population
Multivariate - data that describes
more than two characteristics (beyond
the scope of this course
Identify the following variables:
1.
the income of adults in your city
Numerical
2.
the color of M&M candies selected at random from a
bag
Categorical
3.
the number of speeding tickets each student in AP
Statistics has received
Numerical
4.
the area code of an individual
Categorical
5.
the birth weights of female babies born at a large
hospital over the course of a year Numerical
Self Check #1
Assignment #1
Graphs for categorical data
Bar Graph
Used for categorical data
Bars do not touch
Categorical variable is typically on the horizontal
axis
To describe – comment on which occurred the
most often or least often
May make a double bar graph or segmented bar
graph for bivariate categorical data sets
Using class survey data:
graph birth month
graph gender & handedness
Pie (Circle) graph
Used for categorical data
To make:
– Proportion 360°
– Using a protractor, mark off each part
To describe – comment on which occurred the
most often or least often
Graphs for numerical data
Dotplot
Used with numerical data (either discrete or
continuous)
Made by putting dots (or X’s) on a number
line
Can make comparative dotplots by using
the same axis for multiple groups
Stemplots (stem & leaf plots)
Used with univariate, numerical data
Must
have
key sobethat
we graph
knowfor
how
Would
a stemplot
a good
the to read
number of pieces of gun chewed per day by
numbers
AP Stat students? Why or why not?
Can split stems when you have long list of
Would a stemplot be a good graph for the
leaves
number of pairs of shoes owned by AP Stat
Can havestudents?
a comparative
stemplot
with two
Why or why
not?
groups
Example:
The following data are price per ounce for various brands
of dandruff shampoo at a local grocery store.
0.32
0.21
0.29
0.54
0.17
0.28
Can you make a stemplot with this data?
0.36
0.23
Example: Tobacco use in G-rated Movies
Total tobacco exposure time (in seconds) for Disney
movies:
223
176
548
37
158
51
299
37
11
165
74
9
2
6
23
206
9
Total tobacco exposure time (in seconds) for other
studios’ movies:
205
162
6
1
117
5
91
155
24
55
17
Make a comparative stemplot.
Graphing Activity
Self Check #2
Assignment #2
Histograms
Used with numerical data
Would
a histogram
be a good graph for the
Bars
touch
on histograms
Twofastest
typesspeed driven by AP Stat students?
Why or why not?
– Discrete
• Bars are centered over discrete values
– Continuous
• Bars cover a class (interval) of values
Would a histogram be a good graph for the
For comparative histograms – use two separate
number of pieces of gun chewed per day by
graphs
same scale
axis
APwith
Statthe
students?
Whyon
or the
whyhorizontal
not?
Cumulative Relative Frequency Plot
(Ogive)
. . . is used to answer questions about percentiles.
Percentiles are the percent of individuals that are
at or below a certain value.
Quartiles are located every 25% of the data. The
first quartile (Q1) is the 25th percentile, while the
third quartile (Q3) is the 75th percentile. What is
the special name for Q2?
Interquartile Range (IQR) is the range of the
middle half (50%) of the data.
IQR = Q3 – Q1
Ogive Activity
Self Check #3
Multiple Choice Test #1
Types (shapes)
of Distributions
Symmetrical
refers to data in which both sides
are (more or less) the same when
the graph is folded vertically down
the middle
bell-shaped is a special type
–has a center mound with two
sloping tails
Uniform
refers to data in which every
class has equal or
approximately equal
frequency
Skewed (left or right)
refers to data in which one
side (tail) is longer than the
other side
the direction of skewness is
on the side of the longer tail
Bimodal (multi-modal)
refers to data in which two
(or more) classes have the
largest frequency & are
separated by at least one
other class
Distribution Activity . . .
Self Check #4
How to describe a
numerical,
univariate graph
What strikes you as the most distinctive
difference among the distributions of exam
scores in classes A, B, & C ?
1. Center
discuss where the middle of
the data falls
three types of central
tendency
–mean, median, & mode
What strikes you as the most distinctive
difference among the distributions of scores in
classes D, E, & F?
Class
2. Spread
discuss how spread out the data
is
refers to the variability of the
data
–Range, standard deviation, IQR
What strikes you as the most distinctive
difference among the distributions of exam
scores in classes G, H, & I ?
3. Shape
refers to the overall shape of
the distribution
symmetrical, uniform,
skewed, or bimodal
What strikes you as the most distinctive
difference among the distributions of exam
scores in class K ?
K
4. Unusual occurrences
outliers - value that lies
away from the rest of the
data
gaps
clusters
anything else unusual
5. In context
You must write your answer
in reference to the specifics
in the problem, using correct
statistical vocabulary and
using complete sentences!
Features of the Distribution Activity
Means & Medians
Parameter Fixed value about a
population
Typical unknown
Statistic Value calculated
from a sample
Measures of Central Tendency
Median - the middle of the data; 50th
percentile
– Observations must be in numerical
order
– Is the middle single value if n is odd
– The average of the middle two values if
n is even
NOTE: n denotes the sample size
Measures of Central Tendency
parameter
Mean - the arithmetic average
– Use m to represent a population mean
statistic
– Use x to represent a sample mean
Formula:
x
x
n
S is the capital Greek letter
sigma – it means to sum the
values that follow
Measures of Central Tendency
Mode – the observation that occurs the
most often
– Can be more than one mode
– If all values occur only once – there
is no mode
– Not used as often as mean & median
Suppose we are interested in the number of lollipops
that are bought at a certain store. A sample of 5
customers buys the following number of lollipops.
Find the median.
The numbers are in order & n
is odd – so find the middle
observation.
2
The median is 4
lollipops!
3 4 8 12
Suppose we have sample of 6 customers that buy the
following number of lollipops. The median is …
The median is 5
The numbers are in order & n
lollipops!
is even – so find the middle
two observations.
Now, average these two values.
5
2
3 4 6 8 12
Suppose we have sample of 6 customers that buy the
following number of lollipops. Find the mean.
To find the mean number of lollipops
add the observations and divide by n.
x 5.833
2 3 4 6 8 12
6
2
3 4 6 8 12
Using the calculator . . .
What would happen to the median & mean if the 12
lollipops were 20?
The median is . . .
The mean is . . .
5
7.17
2 3 4 6 8 20
6
What happened?
2
3 4 6 8 20
What would happen to the median & mean if the 20
lollipops were 50?
The median is . . .
The mean is . . .
5
12.17
2 3 4 6 8 50
6
What happened?
2
3 4 6 8 50
What would happen to the median & mean if the 20
lollipops were 50?
The median is . . .
The mean is . . .
5
12.17
2 3 4 6 8 50
6
What happened?
2
3 4 6 8 50
Resistant Statistics that are not affected by outliers
Is the median resistant?
►Is
YES
the mean resistant? NO
Look at the following data set. Find the
mean.
22
23
24
25
25
26
29
30
x 25 .5
Now find how each observation
Will this sum deviates
always
equal zero?
from the mean.
This is the
What is deviation
the sum
from the
mean? mean.
YES
of the deviations from the
x x 0
Look at the following data set. Find the mean &
median.
Mean = 27
Median = 27
Create a histogram with the data.
(use
x-scale
2) Then find
the
Look
at theof
placement
of the
mean
median.
mean
andand
median
in this
symmetrical
21
23distribution.
23
24
26
30
26
30
27
30
27
31
25
27
32
25
27
32
26
28
Look at the following data set. Find the mean &
median.
Mean = 28.176
Median = 25
Create a histogram with the data.
(use
x-scale
8) Then find
the
Look
at theofplacement
of the
mean
median.
mean
andand
median
in this
right skewed distribution.
22
29
28
22
24
25
28
21
23
24
23
26
36
38
62
23
25
Look at the following data set. Find the mean &
median.
Mean = 54.588
Median = 58
Create a histogram with the data.
Then
find
the placement
mean and median.
Look
at the
of the
mean and median in this
skewed left distribution.
21
46
54
47
53
60
55
55
56
58
58
58
58
62
63
64
60
Recap:
In a symmetrical distribution, the mean
and median are equal.
In a skewed distribution, the mean is
pulled in the direction of the skewness.
In a symmetrical distribution, you should
report the mean!
In a skewed distribution, the median
should be reported as the measure of
center!
Trimmed mean:
To calculate a trimmed mean:
Multiply the % to trim by n
Truncate that many observations from
BOTH ends of the distribution (when
listed in order)
Calculate the mean with the shortened
data set
Find a 10% trimmed mean with the following data.
12 14 19 20 22 24 25 26
26
35
10%(10) = 1
So remove one observation from
each side!
14 19 20 22 24 25 26 26
22
8
Matching Graphs Activity
Mean and Median Assignment
Why use boxplots?
ease of construction
convenient handling of outliers
construction is not subjective
(like histograms)
Used with medium or large size
data sets (n > 10)
useful for comparative displays
Disadvantage of
boxplots
does not retain the
individual observations
should not be used with
small data sets (n < 10)
How to construct
find five-number summary
Min Q1 Med Q3 Max
draw box from Q1 to Q3
draw median as center line in
the box
extend whiskers to min & max
Modified boxplots
display outliers
fences mark off mild &
ALWAYS use modified
extreme outliers
boxplots in this class!!!
whiskers extend to largest
(smallest) data value inside the
fence
Inner fence
Interquartile Range
Q1 –– 1.5IQR
Q3 + 1.5IQR
(IQR)
is the range
(length) of
theobservation
box
Any
outside this
Q3 -fence
Q1 is an outlier! Put a dot
for the outliers.
Q1
Q3
Modified Boxplot . . .
Draw the “whisker” from the quartiles
to the observation that is within the
fence!
Q1
Q3
Outer fence
Q1 – 3IQR
Q3 + 3IQR
observation
between
AnyAny
observation
outside
this
theisfences
is considered
fence
an extreme
outlier! a
mild outlier.
Q1
Q3
For the AP Exam . . .
. . . you just need to find outliers,
you DO NOT need to identify them
as mild or extreme.
Therefore, you just need to use the
1.5IQRs
A report from the U.S. Department of Justice
gave the following percent increase in federal
prison populations in 20 northeastern & midwestern states in 1999.
5.9
4.8
8.0
1.3
6.9
4.4
5.0
4.5
7.2
5.9
3.5
3.2
4.5
7.2
5.6
6.4
Create a modified boxplot. Describe the
distribution.
Use the calculator to create a modified boxplot.
4.1
5.5
6.3
5.3
Evidence suggests that a high indoor radon
concentration might be linked to the development of
childhood cancers. The data that follows is the
radon concentration in two different samples of
houses. The first sample consisted of houses in
which a child was diagnosed with cancer. Houses in
the second sample had no recorded cases of
childhood cancer.
(see data on note page)
Create parallel boxplots. Compare the distributions.
Cancer
No Cancer
100
200
Radon
The median radon concentration for the no cancer
group is lower than the median for the cancer
group. The range of the cancer group is larger than
the range for the no cancer group. Both
distributions are skewed right. The cancer group
has outliers at 39, 45, 57, and 210. The no cancer
group has outliers at 55 and 85.
Matching Box Plots, Histograms,
and Summary Statistics Activity
Self Check #5
Comparative Boxplots
Assignment
Why is the study of variability
important?
Allows us to distinguish between
usual & unusual values
In some situations, want more/less
variability
– scores on standardized tests
– time bombs
– medicine
Measures of Variability
range (max-min)
interquartile range (Q3-Q1)
deviations x x Lower case
variance
2
Greek letter
sigma
standard deviation
Suppose that we have these data values:
24
16
34
28
26
21
30
35
37
29
Find the mean.
Find the deviations.
x x
What is the sum of the deviations from the mean?
24
16
34
28
26
21
30
35
37
29
x x
2
Square the deviations:
Find the average of the squared deviations:
2
x m
2
n
The average of the deviations
squared is called the variance.
Population parameter
2
Sample
s
2
statistic
Calculation of variance
of a sample
xn x
s
n 1
2
2
df
Degrees of Freedom (df)
n deviations contain (n - 1)
independent pieces of
information about
variability
A standard deviation is a
measure of the average
deviation from the mean.
Use calculator
Which measure(s) of
variability is/are
resistant?
Mean and Variance Activity
Mean and Variance Worksheet
Self Check #6
Show me the Money Assignment
Multiple Choice Test #2
Assignment #3
Linear transformation rule
When adding a constant to a random variable,
the mean changes but not the standard
deviation.
When multiplying a constant to a random
variable, the mean and the standard
deviation changes.
An appliance repair shop charges a $30 service call
to go to a home for a repair. It also charges $25 per
hour for labor. From past history, the average length
of repairs is 1 hour 15 minutes (1.25 hours) with
standard deviation of 20 minutes (1/3 hour).
Including the charge for the service call, what is the
mean and standard deviation for the charges for
labor?
m 30 25(1.25) $61.25
1
25 $8.33
3
Rules for Combining two variables
To find the mean for the sum (or difference), add
(or subtract) the two means
To find the standard deviation of the sum (or
differences), ALWAYS add the variances, then
take the square root.
Formulas:
m a b m a mb
ma b ma mb
2
a
a b
2
b
If variables are independent
Bicycles arrive at a bike shop in boxes. Before they can be
sold, they must be unpacked, assembled, and tuned
(lubricated, adjusted, etc.). Based on past experience, the
times for each setup phase are independent with the
following means & standard deviations (in minutes). What
are the mean and standard deviation for the total bicycle
setup times?
Phase
Mean
SD
Unpacking
Assembly
Tuning
3.5
21.8
12.3
0.7
2.4
2.7
mT 3.5 21.8 12.3 37.6 minutes
T 0.7 2 2.42 2.7 2 3.680 minutes
Self Check #7