CHAPTER 2 Descriptive Statistics
Download
Report
Transcript CHAPTER 2 Descriptive Statistics
+
CHAPTER 2
Descriptive Statistics
SECTION 2.1 FREQUENCY DISTRIBUTIONS
+
Section 2.1: Frequency
Distributions and Their Graphs
GOAL: explore
many ways to organize
and describe a data set
Center, variability
(or spread), and shape
+
FREQUENCY DISTRIBUTION
A
table that shows classes or intervals of data
entries with a count of the number of entries in
each class. The frequency f of a class is the
number of data entries in the class.
Frequency
– how often
Distribution
– how spread out/concentrated
Example: Pg. 40
+ Example of a Frequency Distribution
Class
Frequency, f
1–5
5
6 – 10
8
11 – 15
6
16 – 20
8
21 - 25
5
26 – 30
4
Lower Class Limit – least number that can belong to a class
Upper Class Limit – greatest number that can belong to a
class
Class Width – the distance between lower (or upper) limits of
consecutive classes
Range – difference between the maximum and minimum
data entries
+
Guidelines for Creating a Frequency Distribution
1.
Determine the range of the data.
2.
Determine the number of classes to use.
3.
Determine the class width.
4.
Find Class Limits.
5.
Find the Class Midpoints.
6.
Find the Class Boundaries.
7.
Tally up the data in each class.
8.
Get the FREQUENCY for each class.
+
Definitions – Additional Features of
Frequency Distributions
Class Midpoint – Sum of the lower and upper limits of a class
divided by two (also known as class mark)
Relative Frequency – portion or percentage of the data that
falls in that class. Take the frequency (f) divided by the
sample size (n).
Cumulative Frequency – sum of the frequency for that class
and all previous classes. The cumulative frequency of the last
class is equal to the sample size n
+
Class Example 1 Page 41
+
Class Activity/HW
Pg. 51
#27, #28
We’ll
be using these frequency distributions again,
so make sure to hold onto them.
HAVE
DONE FOR TOMORROW, WE NEED THEM!
DO ON SEPARATE PAPER
+
Graphs of Frequency Distributions
Frequency
Histogram – a bar graph the
represents the frequency distribution of a
data set
Properties
1.
2.
3.
of a Frequency Histogram
The horizontal scale is quantitative and
measures the data values
The vertical scale measures the frequencies of
the classes
Consecutive bars MUST touch
+
Other Types of Graphs
FREQUENCY
POLYGON
A
line graph that emphasizes the continuous
change in frequencies
RELATIVE
FREQUENCY HISTOGRAM
Has
the same shape/horizontal scale as
frequency histogram
Vertical scale measures RELATIVE frequencies
CUMULATIVE
FREQUENCY GRAPH
(OGIVE)
Line
graph that displays the cumulative
frequency of each class at its upper class
boundary
+
#27: Newspaper Reading Times (min)
Class
Frequency
Mid-point
Relative f
Cumulative f
0–7
8
3.5
0.32
8
8 – 15
8
11.5
0.32
16
16 – 23
3
19.5
0.12
19
24 – 31
3
27.5
0.12
22
32 – 39
3
35.5
0.12
25
n = 25
+
Class Activity/HW
Using Frequency Distribution you created for #28 from page
51 complete the following:
ON GRAPH PAPER:
1.
Frequency Histogram
2.
Frequency Polygon
3.
Relative Frequency Histogram
4.
Ogive
**MAKE SURE TO LABEL GRAPHS AND WRITE NEATLY!
(TURN IN WITH FREQUENCY DISTRIBUTION FOR WRITTEN
FEEDBACK)
DUE TOMORROW!!!!
+
#28 Book Spending Per Semester ($)
Class
Frequency
Mid-Point
Relative f
Cumulative f
30 – 113
5
71.5
0.1724
5
114 – 197
7
155.5
0.2414
12
198 – 281
8
239.5
0.2759
20
282 – 365
2
323.5
0.0690
22
366 – 449
3
407.5
0.1034
25
450 – 533
4
491.5
0.1379
29
n = 29
+Pirate Baseball Activity: Due
Given: Pittsburgh Pirates Home Run Data 1961 – 2009
Using this data, create the following: USING EIGHT CLASSES
1.
2.
3.
4.
5.
to work in
class.
THIS WILL BE
GRADED.
Title, Axis Labels, equal class widths
Evidence of ALL calculations (class widths, boundaries, midpoints) Due:
Must include:
Frequency Distribution (including ALL parts and rel./cum. freq)
Frequency Histogram
Only given
Frequency Polygon
TODAY and
Relative Frequency Histogram
TOMORROW
Ogive
Straight lines
Neatness
Straight Edge
Graph Paper
Then, using your phone or an iPad look up homerun data for 2010, 2011,
2012, 2013, 2014, and 2015.
Create a NEW Frequency Distribution
Two New Charts
Explain how this new data has changed the distribution (one paragraph)
+
Section 2.2: More Graphs
and Displays
+ Stem and Leaf Plot
Display
for quantitative data
Give
the feel of a histogram while retaining
data values
Easy
way to sort data
Stem
– the entry’s leftmost digits
Leaf
– the entry’s rightmost digits
Example
1 and 2 on Pages 55 – 56
Ordered/Unordered
MUST
ALWAYS INCLUDE A KEY!
+ Dot Plot
Each
data entry is plotted,
using a point, above a
horizontal axis
Can
see how data is
distributed, see specific data
entries, and identify unusual
data values
Example
3 Pg. 57
+ Graphing Qualitative Data Sets: Pie
Charts
A
circle that is divided into
sectors that represent categories
Area
of each sector is
proportional to the category’s
frequency
KEY:
To
find central angle: MULTIPLY
RELATIVE FREQUENCY BY 360°
+ Pareto Chart
A
vertical bar graph where the
height represents frequency or
relative frequency
BARS
ARE POSITIONED IN
ORDER OF HIGHEST TO LOWEST
REMEMBER: Qualitative
Example
5 Page 59
Data
+ Graphing Paired Data Sets: Scatter
Plot
Paired
Data Sets: one data set
corresponds to one entry in a second
data set
Scatter
Plot: ordered pairs are graphed
as points in a coordinate plane
Use
to SHOW THE RELATIONSHIP
BETWEEN TWO QUANTITATIVE
VARIABLES
Example
6 Page 60
+ Time Series Chart
Used to graph a time series
Time
series – data set composed
of quantitative entries taken at
regular intervals over a period of
time
Example
Scatter
7 Page 61
Plot: No Line
Time Series Chart: Connected data
points
+
GRADED ASSSIGNMENT:
Individually, complete
the following
graphs from pages 64 – 65.
#18, #20, #22, #24, #25, #29, #30
Must
be handed in by the beginning of
class on ________ (only ______to work in
class)
Will be graded for correctness and
neatness
Use graph paper, ruler, protractor, and
compass!
+
Section 2.3 - Measures of
Central Tendency
+
Measures of Central Tendency
MEAN, MEDIAN, MODE
Value
that represents TYPICAL, or CENTRAL
entry of the data
+
Mean
Population
Mean
μ= Σx /N
Sample
Mean
x = Σx / n
N = number of entries in a population
n = number of entries in a sample
+
Example 1 Pg. 67
The
prices (in dollars) for a sample of
roundtrip flights from Chicago, Illinois to
Cancun, Mexico are listed. What is the
mean price of the flights?
872 432 397 427 388 782 397
WHEN CALCULATING GO ONE DECIMAL
FURTHER THAN ORIGINAL DATA
+
Median
Value that lies in the middle of the
data when the data is ORDERED
If
data set has an even number of
entries, the median is the mean of
the two middle data entries
Median divides a data set into TWO
equal parts
EX: 4
5 6 8 10 14
+ Mode
Most
frequently occurring data point
If
ALL occur only ONCE, then there is
NO MODE
If
two data entries occur the same
number of times, then BOTH are
modes and we have a BIMODAL
DISTRIBUTION
If
more than two modes, we have a
MULITMODAL DISTRIBUTION
+
Note on Mode
Mode is only measure of central
tendency that MUST be an actual
data point.
+
Outlier
Data
point that is far away from all of
the other data points
+
Assignment: Part 1 Section 2.3
Pg. 75
– 78 #18 - #34 even
Finding mean, median, and mode.
Label any outliers.
Use correct notation for mean.
(population mean vs. sample mean)
+
Today’s Question: How can we
describe the “middle” of unequal
data?
You have $200 for 17 days, $300 for 5 days,
and $150 dollars for 9 days out of a month.
What was your average amount of money for
the month?
+
Weighted Mean
A
mean where each data point in not
“worth” the same amount.
Entries
have varying “weights”.
x = Σ(x * w) / Σw
**Where w is the weight of each entry
+
Example: Weighted Mean Vs.
Regular Mean
Tests
are worth 50% of overall grade,
quizzes 30% and homework 20%.
You
get 100 in HW, 90 on a quiz, and 80 on a
test.
Calculate
Why
regular and weighted mean.
is one lower than the other?
+
Example: Weighted Mean Vs.
Regular Mean
You
have $200 for 17 days, $300 for 5
days, and $150 dollars for 9 out of a
month.
Calculate
regular and weighted
mean.
Why
is one lower than the other?
+
Mean of a Frequency Distribution
x = Σ(x * f) / n
Where n = Σf,
x is the class midpoint,
and f is the frequency
of each class
+
Guidelines: Finding the Mean of a
Frequency Distribution (Pg. 72)
Find the midpoint of each class.
Find the sum of the products of the midpoints and the
frequencies.
Σ(x *f )
Find the sum of the frequencies.
n = Σf
Find the mean of the frequency distribution.
x = Σ(x * f) / n
+
The Shape of Distributions (Pg. 73)
Symmetric
Uniform
– can be folded in the middle
– Rectangular, equal frequencies
Multimodal
Skewed
– More than one peak
– a “long tail” on one side
Direction
of the skew is the side the tail is on.
Left skewed means the tail is on the left side
Right skewed means the tail in on the right side
+
EXAMPLES: Page 73
Mean
describes data best when data
is symmetric.
Median
describes data best when
data is skewed or contains outliers.
Mode
describes data best when data
is nominal level of measurement.
+
Assignment: Part 2 Section 2.3
Pg. 77
THIS
– 78 #41-#44, #46 - #48, #52- #54
IS A LENGTHY ASSIGNMENT, GET
STARTED ON IT!!!
+
Section 2.4: Measures of
Variation
+
Find the mean, median, and mode.
SET
A: 37, 38, 39, 41, 41,41, 42, 44, 45, 47
SET
B: 23, 29, 32, 40, 41, 41, 48, 50, 52, 59
+
+Measures of Variation:
Range, Deviation, Variance, Standard
Deviation
Range
= (Maximum Data Entry) –
(Minimum Data Entry)
Range
only uses two pieces of data
Variation
and Standard Deviation use ALL
entries of a data set
Deviation
+
Deviation
of an entry x in a POPULATION
data set is the difference between the
entry and the mean μ of the data set.
Deviation
of x = x – μ(POPULATION)
Deviation
of x = x – x (SAMPLE)
DISTANCE
FROM MEAN!
+
Calculate Deviations of Company A
37, 38, 39, 41, 41,41, 42, 44, 45, 47
Find
the sum of the deviations.
+
POPULATION VARIANCE
For
POPULATION DATA
σ^2 = Σ (x- μ) ^2 / N
σ
is the lowercase Greek letter
Sigma
+
Population Standard Deviation
Square
Root of Variance (only σ)
Average
Larger
distance away from the mean
standard deviation means
more spread out data.
+
Sample Variance and Sample
Standard Deviation.
When
using sample data use x not μ
Divide
by N-1 instead of N
+
Calculate sample variation and
standard deviation for Company B.
SET
A: 37, 38, 39, 41, 41,41, 42, 44, 45, 47
SET
B: 23, 29, 32, 40, 41, 41, 48, 50, 52, 59
+
+
Assignment: Part 1 Section 2.4
Pg. 92
– 94 #1, 3, 13, 14, 19, 20
+
How can we use standard deviation
to make decisions about data?
Standard
deviation and variance tell
us how spread out the data is
+
Empirical Rule (68-95-99.7 Rule)
In
1.
2.
3.
a BELL – SHAPED distribution,
~68% of data is within 1 Standard Deviation of
mean
~95% of data is within 2 Standard Deviations of
mean
~99.7% of data is within 3 Standard Deviations of
mean
+
+
Example:
If
65 men’s heights have a bell shaped
distribution with mean of 68 in and
standard deviation of 2.5 inches, what
percent of people are between 68 and
73 inches?
How
many men is that?
+
Chebychev’s Theorem
In
ANY distribution, the percent of data with
k standard deviations (k >1) is AT LEAST 1 –
(1/k^2)
For
k = 2:
For
k = 3:
+
Example:
A
sample of 40 runners in a 1 mile race
gave a mean of 7 minutes with a standard
deviation of 1.25 minutes. What can we say
about how many people ran a mile in
between 4.5 and 9.5 minutes?
+
Assignment: Part 2 Section 2.4
Pg. 95
Pg. 88
– 97 #29 - #36 ONLY PART A
has nice picture of Empirical
Rule and Bell-Shaped Distributions
+
Section 2.5: Measure of
Position
+
Fractiles
Numbers
that partition, or divide, an
ORDERED data set into equal parts
Example: Median
– Fractile that
divides data set into two equal parts
+
Quartiles
Three
Quartiles: Q1, Q2, and Q3
Divide
an ordered data set into four equal parts
Q1
– First Quartile – one quarter of data fall on
or below Q1
Q2
– Second Quartile – half of the data fall on or
below Q2
Q2
Q3
is MEDIAN of the data set
– Third Quartile – ¾ of the data fall on or
below Q3
+
Interquartile Range
Difference
between the third and first
quartiles
IQR
= Q3 – Q1
+
Box-and-Whisker Plot
Five
Number Summary:
Maximum
Minimum
Median
Q1
Q3
5, 7, 9, 10, 11, 13, 14, 15, 16, 17, 18, 18, 20
What
21, 37
conclusion can we draw from graph?
+
+
Assignment: Part 1 Section 2.5
Pg. 110
– 111 #17 - #20, #23, #26, #27, #28
+
The Standard Score or Z-Score
Measures
set
a data value’s position in the data
The
STANDARD SCORE or Z-SCORE
represents the number of standard
deviations a given value x fall from the
mean μ. To find the z-score for a given
value, use the following formula:
Z = Value – Mean = x – μ
Standard Dev.
σ
+
Z-Score
Can
be POSITIVE, NEGATIVE, or ZERO
If
z is NEGATIVE, then the corresponding x
value is BELOW the mean.
If
z is POSITIVE, then the corresponding x
value is ABOVE the mean.
If
z is ZERO, then the corresponding x value
is the MEAN.
+
Z-Score Example
Mean
speed of vehicles is 56 MPH.
Standard
Deviation of 4 MPH.
Car
1: 62 MPH
Car
2: 47 MPH
Car
3: 56 MPH
Calculate
Interpret
the z-score for Cars 1, 2, and 3.
this information.
+
+
Z-Scores PLUS the Empirical Rule
Empirical
Rule: 95% of data lies within 2
Standard Deviations
Z-Score: 95% of data lies within -2 and 2.
Usual scores
A
z-score less than -2 or greater than 2 we would
consider unusual.
A
z-score less than -3 or greater than 3 we
would consider VERY unusual.
REMEMBER
– BELL-Shaped for Empirical Rule
+
Assignment: Part 2 Section 2.5
Pg. 111
- 112 #29 - #34
+
Section 2.3 Part 1
(Mean, Median,
Mode,)
18. 6.2, 6, 5
20. 200.4, 186, none
22. 61.2, 55, 80 and
125
24. NP, NP, worse
26. NP, NP, domestic
28. 16.6, 15, none
30. 314.1, 374, none
32. 2.49, 2.35, 4.0
34. 213.4, 214, 217
Section 2.3 Part 2
41. 89
42. 36320
43. 612.73
44. 982.19
46. 84
47. 65
48. 69.7
52. Skewed Right
53. Symmetric
54. Uniform
Section 2.4 Part 1
1. R = 8, M = 7.9, V = 6.1, SD = 2.5
3. R = 12, M = 11.9, V = 17.1, SD = 4.1
19. LA: R = 17.6, V = 37.5, SD = 6.11
LB: R = 8.7, V = 8.71, SD = 2.95
20. Dallas: R = 18.1, V = 37.33, SD = 6.11
Houston: R = 13, V = 12.26, SD = 3.5
Section 2.4 Part 2
29. 68%
30. Between 1500 and 3300
31. a. 51, b. 17
32. a. 38, b. 19
33. 1000, 2000
34. 3325, 1490
35. 24
36. Sentences involving
54.97 and 59.17
Section 2.5 Part 1
17. None
18. SR
19. SL
20. S
23. Q1 = 2, Q2 = 4, Q3 = 5
26. Q1 = 15.125, Q2 = 15.8, Q3 = 17.65
27. a. 5, b. 50%, c. 25%
28. a. 17.65, b. 50%, c. 50%
Section 2.5 Part 2
31. Stats: 1.43, Bio: 0.77. Did better on
Stats
32. Stats: -0.43, Bio: -0.77, Did better on
Stats
33. Stats: 2.14, Bio: 1.54, Did better on
Stats
34.Both 0, Both performed equally.