CHAPTER 2 Descriptive Statistics

Download Report

Transcript CHAPTER 2 Descriptive Statistics

+
CHAPTER 2
Descriptive Statistics
SECTION 2.1 FREQUENCY DISTRIBUTIONS
+
Section 2.1: Frequency
Distributions and Their Graphs
GOAL: explore
many ways to organize
and describe a data set
 Center, variability
(or spread), and shape
+
FREQUENCY DISTRIBUTION
A
table that shows classes or intervals of data
entries with a count of the number of entries in
each class. The frequency f of a class is the
number of data entries in the class.
 Frequency
– how often
 Distribution
– how spread out/concentrated
 Example: Pg. 40
+ Example of a Frequency Distribution
Class
Frequency, f
1–5
5
6 – 10
8
11 – 15
6
16 – 20
8
21 - 25
5
26 – 30
4
Lower Class Limit – least number that can belong to a class
Upper Class Limit – greatest number that can belong to a
class
Class Width – the distance between lower (or upper) limits of
consecutive classes
Range – difference between the maximum and minimum
data entries
+
Guidelines for Creating a Frequency Distribution
1.
Determine the range of the data.
2.
Determine the number of classes to use.
3.
Determine the class width.
4.
Find Class Limits.
5.
Find the Class Midpoints.
6.
Find the Class Boundaries.
7.
Tally up the data in each class.
8.
Get the FREQUENCY for each class.
+
Definitions – Additional Features of
Frequency Distributions

Class Midpoint – Sum of the lower and upper limits of a class
divided by two (also known as class mark)

Relative Frequency – portion or percentage of the data that
falls in that class. Take the frequency (f) divided by the
sample size (n).

Cumulative Frequency – sum of the frequency for that class
and all previous classes. The cumulative frequency of the last
class is equal to the sample size n
+
Class Example 1 Page 41
+
Class Activity/HW
 Pg. 51
#27, #28
 We’ll
be using these frequency distributions again,
so make sure to hold onto them.
 HAVE

DONE FOR TOMORROW, WE NEED THEM!
DO ON SEPARATE PAPER
+
Graphs of Frequency Distributions
 Frequency
Histogram – a bar graph the
represents the frequency distribution of a
data set
 Properties
1.
2.
3.
of a Frequency Histogram
The horizontal scale is quantitative and
measures the data values
The vertical scale measures the frequencies of
the classes
Consecutive bars MUST touch
+
Other Types of Graphs
 FREQUENCY
POLYGON
A
line graph that emphasizes the continuous
change in frequencies
 RELATIVE
FREQUENCY HISTOGRAM
 Has
the same shape/horizontal scale as
frequency histogram
 Vertical scale measures RELATIVE frequencies
 CUMULATIVE
FREQUENCY GRAPH
(OGIVE)
 Line
graph that displays the cumulative
frequency of each class at its upper class
boundary
+
#27: Newspaper Reading Times (min)
Class
Frequency
Mid-point
Relative f
Cumulative f
0–7
8
3.5
0.32
8
8 – 15
8
11.5
0.32
16
16 – 23
3
19.5
0.12
19
24 – 31
3
27.5
0.12
22
32 – 39
3
35.5
0.12
25
n = 25
+
Class Activity/HW


Using Frequency Distribution you created for #28 from page
51 complete the following:
ON GRAPH PAPER:
1.
Frequency Histogram
2.
Frequency Polygon
3.
Relative Frequency Histogram
4.
Ogive
**MAKE SURE TO LABEL GRAPHS AND WRITE NEATLY!
(TURN IN WITH FREQUENCY DISTRIBUTION FOR WRITTEN
FEEDBACK)
DUE TOMORROW!!!!
+
#28 Book Spending Per Semester ($)
Class
Frequency
Mid-Point
Relative f
Cumulative f
30 – 113
5
71.5
0.1724
5
114 – 197
7
155.5
0.2414
12
198 – 281
8
239.5
0.2759
20
282 – 365
2
323.5
0.0690
22
366 – 449
3
407.5
0.1034
25
450 – 533
4
491.5
0.1379
29
n = 29
+Pirate Baseball Activity: Due

Given: Pittsburgh Pirates Home Run Data 1961 – 2009

Using this data, create the following: USING EIGHT CLASSES
1.
2.
3.
4.
5.






to work in
class.
THIS WILL BE
GRADED.
Title, Axis Labels, equal class widths
Evidence of ALL calculations (class widths, boundaries, midpoints) Due:
Must include:


Frequency Distribution (including ALL parts and rel./cum. freq)
Frequency Histogram
Only given
Frequency Polygon
TODAY and
Relative Frequency Histogram
TOMORROW
Ogive
Straight lines
Neatness
Straight Edge
Graph Paper
Then, using your phone or an iPad look up homerun data for 2010, 2011,
2012, 2013, 2014, and 2015.



Create a NEW Frequency Distribution
Two New Charts
Explain how this new data has changed the distribution (one paragraph)
+
Section 2.2: More Graphs
and Displays
+ Stem and Leaf Plot
 Display
for quantitative data
 Give
the feel of a histogram while retaining
data values
 Easy
way to sort data
 Stem
– the entry’s leftmost digits
 Leaf
– the entry’s rightmost digits
 Example
1 and 2 on Pages 55 – 56
 Ordered/Unordered
 MUST
ALWAYS INCLUDE A KEY!
+ Dot Plot
Each
data entry is plotted,
using a point, above a
horizontal axis
Can
see how data is
distributed, see specific data
entries, and identify unusual
data values
Example
3 Pg. 57
+ Graphing Qualitative Data Sets: Pie
Charts
A
circle that is divided into
sectors that represent categories
Area
of each sector is
proportional to the category’s
frequency
KEY:
To
find central angle: MULTIPLY
RELATIVE FREQUENCY BY 360°
+ Pareto Chart
A
vertical bar graph where the
height represents frequency or
relative frequency
BARS
ARE POSITIONED IN
ORDER OF HIGHEST TO LOWEST
REMEMBER: Qualitative
Example
5 Page 59
Data
+ Graphing Paired Data Sets: Scatter
Plot
Paired
Data Sets: one data set
corresponds to one entry in a second
data set
Scatter
Plot: ordered pairs are graphed
as points in a coordinate plane
Use
to SHOW THE RELATIONSHIP
BETWEEN TWO QUANTITATIVE
VARIABLES
Example
6 Page 60
+ Time Series Chart
Used to graph a time series
Time
series – data set composed
of quantitative entries taken at
regular intervals over a period of
time
Example
Scatter
7 Page 61
Plot: No Line
Time Series Chart: Connected data
points
+
GRADED ASSSIGNMENT:
Individually, complete
the following
graphs from pages 64 – 65.
 #18, #20, #22, #24, #25, #29, #30
 Must
be handed in by the beginning of
class on ________ (only ______to work in
class)
 Will be graded for correctness and
neatness
 Use graph paper, ruler, protractor, and
compass!
+
Section 2.3 - Measures of
Central Tendency
+
Measures of Central Tendency
 MEAN, MEDIAN, MODE
 Value
that represents TYPICAL, or CENTRAL
entry of the data
+
Mean
Population
Mean
μ= Σx /N
Sample
Mean
x = Σx / n
N = number of entries in a population
n = number of entries in a sample
+
Example 1 Pg. 67
 The
prices (in dollars) for a sample of
roundtrip flights from Chicago, Illinois to
Cancun, Mexico are listed. What is the
mean price of the flights?
872 432 397 427 388 782 397
WHEN CALCULATING GO ONE DECIMAL
FURTHER THAN ORIGINAL DATA
+
Median
Value that lies in the middle of the
data when the data is ORDERED
If
data set has an even number of
entries, the median is the mean of
the two middle data entries
Median divides a data set into TWO
equal parts
EX: 4
5 6 8 10 14
+ Mode
Most
frequently occurring data point
If
ALL occur only ONCE, then there is
NO MODE
If
two data entries occur the same
number of times, then BOTH are
modes and we have a BIMODAL
DISTRIBUTION
If
more than two modes, we have a
MULITMODAL DISTRIBUTION
+
Note on Mode
Mode is only measure of central
tendency that MUST be an actual
data point.
+
Outlier
Data
point that is far away from all of
the other data points
+
Assignment: Part 1 Section 2.3
Pg. 75
– 78 #18 - #34 even
Finding mean, median, and mode.
Label any outliers.
Use correct notation for mean.
(population mean vs. sample mean)
+
Today’s Question: How can we
describe the “middle” of unequal
data?
You have $200 for 17 days, $300 for 5 days,
and $150 dollars for 9 days out of a month.
What was your average amount of money for
the month?
+
Weighted Mean
A
mean where each data point in not
“worth” the same amount.
 Entries
have varying “weights”.
x = Σ(x * w) / Σw
**Where w is the weight of each entry
+
Example: Weighted Mean Vs.
Regular Mean
 Tests
are worth 50% of overall grade,
quizzes 30% and homework 20%.
 You
get 100 in HW, 90 on a quiz, and 80 on a
test.
 Calculate
 Why
regular and weighted mean.
is one lower than the other?
+
Example: Weighted Mean Vs.
Regular Mean
You
have $200 for 17 days, $300 for 5
days, and $150 dollars for 9 out of a
month.
Calculate
regular and weighted
mean.
Why
is one lower than the other?
+
Mean of a Frequency Distribution
x = Σ(x * f) / n
Where n = Σf,
x is the class midpoint,
and f is the frequency
of each class
+
Guidelines: Finding the Mean of a
Frequency Distribution (Pg. 72)

Find the midpoint of each class.

Find the sum of the products of the midpoints and the
frequencies.
Σ(x *f )

Find the sum of the frequencies.
n = Σf

Find the mean of the frequency distribution.
x = Σ(x * f) / n
+
The Shape of Distributions (Pg. 73)
 Symmetric
 Uniform
– can be folded in the middle
– Rectangular, equal frequencies
 Multimodal
 Skewed
– More than one peak
– a “long tail” on one side
 Direction
of the skew is the side the tail is on.
 Left skewed means the tail is on the left side
 Right skewed means the tail in on the right side
+
EXAMPLES: Page 73
Mean
describes data best when data
is symmetric.
Median
describes data best when
data is skewed or contains outliers.
Mode
describes data best when data
is nominal level of measurement.
+
Assignment: Part 2 Section 2.3
Pg. 77
THIS
– 78 #41-#44, #46 - #48, #52- #54
IS A LENGTHY ASSIGNMENT, GET
STARTED ON IT!!!
+
Section 2.4: Measures of
Variation
+
Find the mean, median, and mode.
SET
A: 37, 38, 39, 41, 41,41, 42, 44, 45, 47
SET
B: 23, 29, 32, 40, 41, 41, 48, 50, 52, 59
+
+Measures of Variation:
Range, Deviation, Variance, Standard
Deviation
 Range
= (Maximum Data Entry) –
(Minimum Data Entry)
 Range
only uses two pieces of data
 Variation
and Standard Deviation use ALL
entries of a data set
Deviation
+
Deviation
of an entry x in a POPULATION
data set is the difference between the
entry and the mean μ of the data set.
Deviation
of x = x – μ(POPULATION)
Deviation
of x = x – x (SAMPLE)
DISTANCE
FROM MEAN!
+
Calculate Deviations of Company A
37, 38, 39, 41, 41,41, 42, 44, 45, 47
Find
the sum of the deviations.
+
POPULATION VARIANCE
For

POPULATION DATA
σ^2 = Σ (x- μ) ^2 / N
σ
is the lowercase Greek letter
Sigma
+
Population Standard Deviation
Square
Root of Variance (only σ)
Average
Larger
distance away from the mean
standard deviation means
more spread out data.
+
Sample Variance and Sample
Standard Deviation.
When
using sample data use x not μ
Divide
by N-1 instead of N
+
Calculate sample variation and
standard deviation for Company B.
SET
A: 37, 38, 39, 41, 41,41, 42, 44, 45, 47
SET
B: 23, 29, 32, 40, 41, 41, 48, 50, 52, 59
+
+
Assignment: Part 1 Section 2.4
Pg. 92
– 94 #1, 3, 13, 14, 19, 20
+
How can we use standard deviation
to make decisions about data?
Standard
deviation and variance tell
us how spread out the data is
+
Empirical Rule (68-95-99.7 Rule)
In
1.
2.
3.
a BELL – SHAPED distribution,
~68% of data is within 1 Standard Deviation of
mean
~95% of data is within 2 Standard Deviations of
mean
~99.7% of data is within 3 Standard Deviations of
mean
+
+
Example:
If
65 men’s heights have a bell shaped
distribution with mean of 68 in and
standard deviation of 2.5 inches, what
percent of people are between 68 and
73 inches?
How
many men is that?
+
Chebychev’s Theorem
 In
ANY distribution, the percent of data with
k standard deviations (k >1) is AT LEAST 1 –
(1/k^2)
 For
k = 2:
 For
k = 3:
+
Example:
A
sample of 40 runners in a 1 mile race
gave a mean of 7 minutes with a standard
deviation of 1.25 minutes. What can we say
about how many people ran a mile in
between 4.5 and 9.5 minutes?
+
Assignment: Part 2 Section 2.4
Pg. 95
Pg. 88
– 97 #29 - #36 ONLY PART A
has nice picture of Empirical
Rule and Bell-Shaped Distributions
+
Section 2.5: Measure of
Position
+
Fractiles
Numbers
that partition, or divide, an
ORDERED data set into equal parts
Example: Median
– Fractile that
divides data set into two equal parts
+
Quartiles
 Three
Quartiles: Q1, Q2, and Q3
 Divide
an ordered data set into four equal parts
 Q1
– First Quartile – one quarter of data fall on
or below Q1
 Q2
– Second Quartile – half of the data fall on or
below Q2
 Q2
 Q3
is MEDIAN of the data set
– Third Quartile – ¾ of the data fall on or
below Q3
+
Interquartile Range
Difference
between the third and first
quartiles
IQR
= Q3 – Q1
+
Box-and-Whisker Plot
Five
Number Summary:
 Maximum
 Minimum
 Median
 Q1
 Q3
 5, 7, 9, 10, 11, 13, 14, 15, 16, 17, 18, 18, 20
 What
21, 37
conclusion can we draw from graph?
+
+
Assignment: Part 1 Section 2.5
Pg. 110
– 111 #17 - #20, #23, #26, #27, #28
+
The Standard Score or Z-Score
 Measures
set
a data value’s position in the data
 The
STANDARD SCORE or Z-SCORE
represents the number of standard
deviations a given value x fall from the
mean μ. To find the z-score for a given
value, use the following formula:
Z = Value – Mean = x – μ
Standard Dev.
σ
+
Z-Score
 Can
be POSITIVE, NEGATIVE, or ZERO
 If
z is NEGATIVE, then the corresponding x
value is BELOW the mean.
 If
z is POSITIVE, then the corresponding x
value is ABOVE the mean.
 If
z is ZERO, then the corresponding x value
is the MEAN.
+
Z-Score Example
 Mean
speed of vehicles is 56 MPH.
 Standard
Deviation of 4 MPH.
 Car
1: 62 MPH
 Car
2: 47 MPH
 Car
3: 56 MPH
 Calculate
 Interpret
the z-score for Cars 1, 2, and 3.
this information.
+
+
Z-Scores PLUS the Empirical Rule
 Empirical
Rule: 95% of data lies within 2
Standard Deviations
Z-Score: 95% of data lies within -2 and 2.
 Usual scores

A
z-score less than -2 or greater than 2 we would
consider unusual.
A
z-score less than -3 or greater than 3 we
would consider VERY unusual.
 REMEMBER
– BELL-Shaped for Empirical Rule
+
Assignment: Part 2 Section 2.5
Pg. 111
- 112 #29 - #34
+
Section 2.3 Part 1
(Mean, Median,
Mode,)
18. 6.2, 6, 5
20. 200.4, 186, none
22. 61.2, 55, 80 and
125
24. NP, NP, worse
26. NP, NP, domestic
28. 16.6, 15, none
30. 314.1, 374, none
32. 2.49, 2.35, 4.0
34. 213.4, 214, 217
Section 2.3 Part 2
41. 89
42. 36320
43. 612.73
44. 982.19
46. 84
47. 65
48. 69.7
52. Skewed Right
53. Symmetric
54. Uniform
Section 2.4 Part 1
1. R = 8, M = 7.9, V = 6.1, SD = 2.5
3. R = 12, M = 11.9, V = 17.1, SD = 4.1
19. LA: R = 17.6, V = 37.5, SD = 6.11
LB: R = 8.7, V = 8.71, SD = 2.95
20. Dallas: R = 18.1, V = 37.33, SD = 6.11
Houston: R = 13, V = 12.26, SD = 3.5
Section 2.4 Part 2
29. 68%
30. Between 1500 and 3300
31. a. 51, b. 17
32. a. 38, b. 19
33. 1000, 2000
34. 3325, 1490
35. 24
36. Sentences involving
54.97 and 59.17
Section 2.5 Part 1
17. None
18. SR
19. SL
20. S
23. Q1 = 2, Q2 = 4, Q3 = 5
26. Q1 = 15.125, Q2 = 15.8, Q3 = 17.65
27. a. 5, b. 50%, c. 25%
28. a. 17.65, b. 50%, c. 50%
Section 2.5 Part 2
31. Stats: 1.43, Bio: 0.77. Did better on
Stats
32. Stats: -0.43, Bio: -0.77, Did better on
Stats
33. Stats: 2.14, Bio: 1.54, Did better on
Stats
34.Both 0, Both performed equally.