Unit 1 - AP Statistics - Lang

Download Report

Transcript Unit 1 - AP Statistics - Lang

Unit 1
Mr. Lang’s AP Statistics Power point
Homework Assignment
 For the A: 1, 3, 5, 7, 8,11- 25 Odd, 27 – 32,
37 – 59 Odd, 60, 69 – 74, 79 – 105 Odd
(except 85, 99, 101) 107 – 110, R1-R10
 For the C: 1, 3, 5, 8, 11- 25 Odd, 37 – 59
Odd, 79 – 103 Odd (except 85, 99, 101) R1R10
 For the D- : 1, 3, 5, 11, 15, 19, 23, 37, 41,
45, 49, 79, 83, 87, 91, 97, 103, R1- R10
Statistics
the science of
collecting, analyzing,
and drawing
conclusions from data
Descriptive statistics
the methods of
organizing &
summarizing data
Inferential statistics
involves making
generalizations from a
sample to a population
Population
The entire collection
of individuals or
objects about which
information is desired
Sample
A subset of the
population, selected for
study in some
prescribed manner
Variable
any characteristic
whose value may
change from one
individual to another
Data
observations on single
variable or
simultaneously on two
or more variables
Types of variables
Categorical variables
or qualitative
identifies basic
differentiating characteristics
of the population
Numerical variables
or quantitative
observations or measurements
take on numerical values
makes sense to average these
values
two types - discrete & continuous
Discrete (numerical)
listable set of values
usually counts of items
Continuous (numerical)
data can take on any values
in the domain of the variable
usually measurements of
something
Classification by the
number of variables
Univariate - data that describes a
single characteristic of the population
Bivariate - data that describes two
characteristics of the population
Multivariate - data that describes
more than two characteristics (beyond
the scope of this course
Identify the following variables:
1.
the income of adults in your city
Numerical
2.
the color of M&M candies selected at random from a
bag
Categorical
3.
the number of speeding tickets each student in AP
Statistics has received
Numerical
4.
the area code of an individual
Categorical
5.
the birth weights of female babies born at a large
hospital over the course of a year Numerical
Self Check #1
Assignment #1
Graphs for categorical data
Bar Graph
 Used for categorical data
 Bars do not touch
 Categorical variable is typically on the horizontal
axis
 To describe – comment on which occurred the
most often or least often
 May make a double bar graph or segmented bar
graph for bivariate categorical data sets
Using class survey data:
graph birth month
graph gender & handedness
Pie (Circle) graph
 Used for categorical data
 To make:
– Proportion 360°
– Using a protractor, mark off each part

 To describe – comment on which occurred the
most often or least often
Graphs for numerical data
Dotplot
 Used with numerical data (either discrete or
continuous)
 Made by putting dots (or X’s) on a number
line
 Can make comparative dotplots by using
the same axis for multiple groups
Stemplots (stem & leaf plots)
 Used with univariate, numerical data
 Must
have
key sobethat
we graph
knowfor
how
Would
a stemplot
a good
the to read
number of pieces of gun chewed per day by
numbers
AP Stat students? Why or why not?
 Can split stems when you have long list of
Would a stemplot be a good graph for the
leaves
number of pairs of shoes owned by AP Stat
 Can havestudents?
a comparative
stemplot
with two
Why or why
not?
groups
Example:
The following data are price per ounce for various brands
of dandruff shampoo at a local grocery store.
0.32
0.21
0.29
0.54
0.17
0.28
Can you make a stemplot with this data?
0.36
0.23
Example: Tobacco use in G-rated Movies
Total tobacco exposure time (in seconds) for Disney
movies:
223
176
548
37
158
51
299
37
11
165
74
9
2
6
23
206
9
Total tobacco exposure time (in seconds) for other
studios’ movies:
205
162
6
1
117
5
91
155
24
55
17
Make a comparative stemplot.
Graphing Activity
Self Check #2
Assignment #2
Histograms
 Used with numerical data
Would
a histogram
be a good graph for the
 Bars
touch
on histograms
 Twofastest
typesspeed driven by AP Stat students?
Why or why not?
– Discrete
• Bars are centered over discrete values
– Continuous
• Bars cover a class (interval) of values
Would a histogram be a good graph for the
 For comparative histograms – use two separate
number of pieces of gun chewed per day by
graphs
same scale
axis
APwith
Statthe
students?
Whyon
or the
whyhorizontal
not?
Cumulative Relative Frequency Plot
(Ogive)
 . . . is used to answer questions about percentiles.
 Percentiles are the percent of individuals that are
at or below a certain value.
 Quartiles are located every 25% of the data. The
first quartile (Q1) is the 25th percentile, while the
third quartile (Q3) is the 75th percentile. What is
the special name for Q2?
 Interquartile Range (IQR) is the range of the
middle half (50%) of the data.
IQR = Q3 – Q1
Ogive Activity
Self Check #3
Multiple Choice Test #1
Types (shapes)
of Distributions
Symmetrical
refers to data in which both sides
are (more or less) the same when
the graph is folded vertically down
the middle
bell-shaped is a special type
–has a center mound with two
sloping tails
Uniform
refers to data in which every
class has equal or
approximately equal
frequency
Skewed (left or right)
refers to data in which one
side (tail) is longer than the
other side
the direction of skewness is
on the side of the longer tail
Bimodal (multi-modal)
refers to data in which two
(or more) classes have the
largest frequency & are
separated by at least one
other class
Distribution Activity . . .
Self Check #4
How to describe a
numerical,
univariate graph
What strikes you as the most distinctive
difference among the distributions of exam
scores in classes A, B, & C ?
1. Center
discuss where the middle of
the data falls
three types of central
tendency
–mean, median, & mode
What strikes you as the most distinctive
difference among the distributions of scores in
classes D, E, & F?
Class
2. Spread
discuss how spread out the data
is
refers to the variability of the
data
–Range, standard deviation, IQR
What strikes you as the most distinctive
difference among the distributions of exam
scores in classes G, H, & I ?
3. Shape
refers to the overall shape of
the distribution
symmetrical, uniform,
skewed, or bimodal
What strikes you as the most distinctive
difference among the distributions of exam
scores in class K ?
K
4. Unusual occurrences
outliers - value that lies
away from the rest of the
data
gaps
clusters
anything else unusual
5. In context
You must write your answer
in reference to the specifics
in the problem, using correct
statistical vocabulary and
using complete sentences!
Features of the Distribution Activity
Means & Medians
Parameter Fixed value about a
population
Typical unknown
Statistic Value calculated
from a sample
Measures of Central Tendency
Median - the middle of the data; 50th
percentile
– Observations must be in numerical
order
– Is the middle single value if n is odd
– The average of the middle two values if
n is even
NOTE: n denotes the sample size
Measures of Central Tendency
parameter
Mean - the arithmetic average
– Use m to represent a population mean
statistic
– Use x to represent a sample mean
Formula:
x
x
n
S is the capital Greek letter
sigma – it means to sum the
values that follow
Measures of Central Tendency
Mode – the observation that occurs the
most often
– Can be more than one mode
– If all values occur only once – there
is no mode
– Not used as often as mean & median
Suppose we are interested in the number of lollipops
that are bought at a certain store. A sample of 5
customers buys the following number of lollipops.
Find the median.
The numbers are in order & n
is odd – so find the middle
observation.
2
The median is 4
lollipops!
3 4 8 12
Suppose we have sample of 6 customers that buy the
following number of lollipops. The median is …
The median is 5
The numbers are in order & n
lollipops!
is even – so find the middle
two observations.
Now, average these two values.
5
2
3 4 6 8 12
Suppose we have sample of 6 customers that buy the
following number of lollipops. Find the mean.
To find the mean number of lollipops
add the observations and divide by n.
x  5.833
2  3  4  6  8  12
6
2
3 4 6 8 12
Using the calculator . . .
What would happen to the median & mean if the 12
lollipops were 20?
The median is . . .
The mean is . . .
5
7.17
2  3  4  6  8  20
6
What happened?
2
3 4 6 8 20
What would happen to the median & mean if the 20
lollipops were 50?
The median is . . .
The mean is . . .
5
12.17
2  3  4  6  8  50
6
What happened?
2
3 4 6 8 50
What would happen to the median & mean if the 20
lollipops were 50?
The median is . . .
The mean is . . .
5
12.17
2  3  4  6  8  50
6
What happened?
2
3 4 6 8 50
Resistant Statistics that are not affected by outliers
Is the median resistant?
►Is
YES
the mean resistant? NO
Look at the following data set. Find the
mean.
22
23
24
25
25
26
29
30
x  25 .5
Now find how each observation
Will this sum deviates
always
equal zero?
from the mean.
This is the
What is deviation
the sum
from the
mean? mean.
YES
of the deviations from the
 x  x   0
Look at the following data set. Find the mean &
median.
Mean = 27
Median = 27
Create a histogram with the data.
(use
x-scale
2) Then find
the
Look
at theof
placement
of the
mean
median.
mean
andand
median
in this
symmetrical
21
23distribution.
23
24
26
30
26
30
27
30
27
31
25
27
32
25
27
32
26
28
Look at the following data set. Find the mean &
median.
Mean = 28.176
Median = 25
Create a histogram with the data.
(use
x-scale
8) Then find
the
Look
at theofplacement
of the
mean
median.
mean
andand
median
in this
right skewed distribution.
22
29
28
22
24
25
28
21
23
24
23
26
36
38
62
23
25
Look at the following data set. Find the mean &
median.
Mean = 54.588
Median = 58
Create a histogram with the data.
Then
find
the placement
mean and median.
Look
at the
of the
mean and median in this
skewed left distribution.
21
46
54
47
53
60
55
55
56
58
58
58
58
62
63
64
60
Recap:
In a symmetrical distribution, the mean
and median are equal.
In a skewed distribution, the mean is
pulled in the direction of the skewness.
In a symmetrical distribution, you should
report the mean!
In a skewed distribution, the median
should be reported as the measure of
center!
Trimmed mean:
To calculate a trimmed mean:
Multiply the % to trim by n
Truncate that many observations from
BOTH ends of the distribution (when
listed in order)
Calculate the mean with the shortened
data set
Find a 10% trimmed mean with the following data.
12 14 19 20 22 24 25 26
26
35
10%(10) = 1
So remove one observation from
each side!
14  19  20  22  24  25  26  26
 22
8
Matching Graphs Activity
Mean and Median Assignment
Why use boxplots?
ease of construction
convenient handling of outliers
construction is not subjective
(like histograms)
Used with medium or large size
data sets (n > 10)
useful for comparative displays
Disadvantage of
boxplots
does not retain the
individual observations
should not be used with
small data sets (n < 10)
How to construct
find five-number summary
Min Q1 Med Q3 Max
draw box from Q1 to Q3
draw median as center line in
the box
extend whiskers to min & max
Modified boxplots
display outliers
fences mark off mild &
ALWAYS use modified
extreme outliers
boxplots in this class!!!
whiskers extend to largest
(smallest) data value inside the
fence
Inner fence
Interquartile Range
Q1 –– 1.5IQR
Q3 + 1.5IQR
(IQR)
is the range
(length) of
theobservation
box
Any
outside this
Q3 -fence
Q1 is an outlier! Put a dot
for the outliers.
Q1
Q3
Modified Boxplot . . .
Draw the “whisker” from the quartiles
to the observation that is within the
fence!
Q1
Q3
Outer fence
Q1 – 3IQR
Q3 + 3IQR
observation
between
AnyAny
observation
outside
this
theisfences
is considered
fence
an extreme
outlier! a
mild outlier.
Q1
Q3
For the AP Exam . . .
. . . you just need to find outliers,
you DO NOT need to identify them
as mild or extreme.
Therefore, you just need to use the
1.5IQRs
A report from the U.S. Department of Justice
gave the following percent increase in federal
prison populations in 20 northeastern & midwestern states in 1999.
5.9
4.8
8.0
1.3
6.9
4.4
5.0
4.5
7.2
5.9
3.5
3.2
4.5
7.2
5.6
6.4
Create a modified boxplot. Describe the
distribution.
Use the calculator to create a modified boxplot.
4.1
5.5
6.3
5.3
Evidence suggests that a high indoor radon
concentration might be linked to the development of
childhood cancers. The data that follows is the
radon concentration in two different samples of
houses. The first sample consisted of houses in
which a child was diagnosed with cancer. Houses in
the second sample had no recorded cases of
childhood cancer.
(see data on note page)
Create parallel boxplots. Compare the distributions.
Cancer
No Cancer
100
200
Radon
The median radon concentration for the no cancer
group is lower than the median for the cancer
group. The range of the cancer group is larger than
the range for the no cancer group. Both
distributions are skewed right. The cancer group
has outliers at 39, 45, 57, and 210. The no cancer
group has outliers at 55 and 85.
Matching Box Plots, Histograms,
and Summary Statistics Activity
Self Check #5
Comparative Boxplots
Assignment
Why is the study of variability
important?
Allows us to distinguish between
usual & unusual values
In some situations, want more/less
variability
– scores on standardized tests
– time bombs
– medicine
Measures of Variability
range (max-min)
interquartile range (Q3-Q1)
deviations  x  x Lower case
variance 
2

Greek letter
sigma
standard deviation 
Suppose that we have these data values:
24
16
34
28
26
21
30
35
37
29
Find the mean.
Find the deviations.
x  x 
What is the sum of the deviations from the mean?
24
16
34
28
26
21
30
35
37
29
x  x 
2
Square the deviations:
Find the average of the squared deviations:

2
x  m 


2
n
The average of the deviations
squared is called the variance.
Population parameter

2
Sample
s
2
statistic
Calculation of variance
of a sample
  xn  x 
s 
n 1
2
2
df
Degrees of Freedom (df)
n deviations contain (n - 1)
independent pieces of
information about
variability
A standard deviation is a
measure of the average
deviation from the mean.
Use calculator
Which measure(s) of
variability is/are
resistant?
Mean and Variance Activity
Mean and Variance Worksheet
Self Check #6
Show me the Money Assignment
Multiple Choice Test #2
Assignment #3
Linear transformation rule
 When adding a constant to a random variable,
the mean changes but not the standard
deviation.
 When multiplying a constant to a random
variable, the mean and the standard
deviation changes.
An appliance repair shop charges a $30 service call
to go to a home for a repair. It also charges $25 per
hour for labor. From past history, the average length
of repairs is 1 hour 15 minutes (1.25 hours) with
standard deviation of 20 minutes (1/3 hour).
Including the charge for the service call, what is the
mean and standard deviation for the charges for
labor?
m  30  25(1.25)  $61.25
1
  25   $8.33
3
Rules for Combining two variables
 To find the mean for the sum (or difference), add
(or subtract) the two means
 To find the standard deviation of the sum (or
differences), ALWAYS add the variances, then
take the square root.
 Formulas:
m a  b  m a  mb
ma b  ma  mb
2
a
 a b    
2
b
If variables are independent
Bicycles arrive at a bike shop in boxes. Before they can be
sold, they must be unpacked, assembled, and tuned
(lubricated, adjusted, etc.). Based on past experience, the
times for each setup phase are independent with the
following means & standard deviations (in minutes). What
are the mean and standard deviation for the total bicycle
setup times?
Phase
Mean
SD
Unpacking
Assembly
Tuning
3.5
21.8
12.3
0.7
2.4
2.7
mT  3.5  21.8  12.3  37.6 minutes
T  0.7 2  2.42  2.7 2  3.680 minutes
Self Check #7