Transcript Chapter 1

Chapter 1
Describing Data with Graphs
Variables and Data


A variable is a characteristic that changes over
time and/or for different individuals or objects
under consideration.
Examples:
Body temperature.
Hair color.
Time to failure of a computer component.
Experimental Unit and Measurement



An experimental unit is the individual or
object on which a variable is measured.
A single measurement results when a variable
is actually measured on an experimental unit.
A set of measurements is called data.
Example: Hair Color



Variable
Hair color
Experimental unit
Person
Typical Measurements
Brown, black, blonde, etc.
Example



Variable
Time until a
light bulb burns out
Experimental unit
Light bulb
Typical Measurements
1500 hours, 1535.5 hours, etc.
Population and Sample
A population is the set of all measurements
of interest to the investigator.
Examples:
Body temperatures of all healthy people
in the world.
Lifetime of a batch of 1000 light bulbs
It might be too expensive or even impossible
to enumerate the entire population.
Population
Sample
A sample is a subset of measurements
selected from the population of interest.
Sampling
Sample
Population
How many variables have you
measured?



Univariate data: One variable is measured on
a single experimental unit.
Bivariate data: Two variables are measured
on a single experimental unit.
Multivariate data: More than two variables
are measured on a single experimental unit.
Types of Variables
Qualitative
Quantitative
Discrete
Continuous
Qualitative Variables
Qualitative variables measure a quality or
characteristic on each experimental unit. (Data
collected is sometimes called Categorical Data)
Examples:
•Hair color (black, brown, blonde…)
•Make of car (Dodge, Honda, Ford…)
•Gender (male, female)
•State of birth (California, Arizona,….)
Quantitative Variables
Quantitative variables measure a numerical
quantity on each experimental unit.
Discrete if it can assume only a finite or
countable number of values.
Continuous if it can assume the infinitely
many values corresponding to the points on a
line interval.
Examples



For each orange tree in a grove, the number
of oranges is measured.
Quantitative discrete
For a particular day, the number of cars
entering a college campus is measured.
Quantitative discrete
Time until a light bulb burns out
Quantitative continuous
Graphing Qualitative Variables


Use a data distribution to describe:
 What values (measurements) of the
variable have been measured
 How often each value (measurement) has
occurred
“How often” can be measured 3 ways:
 Frequency
 Relative frequency = Frequency/n
 Percent = 100 x Relative frequency
Example


A bag of M&Ms contains 25 candies:
m m m m m m m m
Raw Data:
m m m m m
m
m

m
m
m
m
m
m
m
m
m
m
Statistical Table:
Color
Red
Tally
mmm
Frequency
Relative
Frequency
Percent
3
3/25 = .12
12%
Blue
mmmmmm
6
6/25 = .24
24%
Green
mm mm
4
4/25 = .16
16%
Orange
mmmmm
5
5/25 = .20
20%
3
3/25 = .12
12%
4
4/25 = .16
16%
Brown
Yellow
mm m
mmmm
6
Frequency
5
4
Bar Chart
3
2
1
0
Brown
Yellow
Red
Blue
Orange
Green
Color
Brown
12.0%
Green
16.0%
Yellow
16.0%
Pie Chart
Orange
20.0%
Red
12.0%
Blue
24.0%
A Pareto Bar Chart is a bar chart
where the bars are ordered from largest
to smallest.
7
6
Frequency
5
4
Pareto Bar
Chart
3
2
1
0
Blue
Orange
Green
Yellow
Color
Brown
Red
Graphing Quantitative Variables

A single quantitative variable measured for
different population segments or for different
categories of classification can be graphed
using a pie or bar chart.
A Big Mac hamburger
costs $4.90 in
Switzerland, $2.90 in
the U.S. and $1.86 in
South Africa.
Cost of a Big Mac ($)
5
4
3
2
1
0
Switzerland
U.S.
Country
South Africa
• A single quantitative variable measured over
time is called a time series. It can be
graphed using a line chart or bar chart.
Example: Consumer Price Index:
Sept
178.10
Oct
Nov
177.60 177.50
Dec
177.30
Jan
Feb
177.60 178.00
Mar
178.60
BUREAU OF LABOR
STATISTICS
Dotplots


For quantitative data, plots the measurements
as points on a horizontal axis, stacking the
points that duplicate existing points.
Example: The set 4, 5, 5, 7, 6
4
5
6
7
Stem and Leaf Plots
For quantitative data, use the actual numerical
values of each data point.
–Divide each measurement into two parts: the stem
and the leaf.
–List the stems in a column, with a vertical line to
their right.
–For each measurement, record the leaf portion in
the same row as its matching stem.
–Order the leaves from lowest to highest in each
stem.
Example
The prices ($) of 18 brands of walking shoes:
90
70
70
70
75
70
65
68
74
70
95
75
70
68
65
40
4
0
5
4
Reorder
60
65
0
5
6
580855
6
055588
7
000504050
7
000000455
8
8
9
05
9
05
Interpreting Graphs: Location and
Spread

Where is the data centered on the horizontal
axis, and how does it spread out from the
center?
Interpreting Graphs: Shapes
Mound shaped and
symmetric (mirror images)
Skewed right: a few
unusually large
measurements
Skewed left: a few unusually
small measurements
Bimodal: two local peaks
Interpreting Graphs: Outliers
No Outliers

Outlier
Are there any strange or unusual
measurements that stand out in the data
set?
Example

A quality control process measures the diameter of a
gear being made by a machine (cm). The technician
records 15 diameters, but inadvertently makes a typing
mistake on the second entry.
1.991 1.891 1.991 1.988 1.993
1.989 1.990 1.988
1.988 1.993 1.991 1.989 1.989 1.993 1.990 1.994
Interpreting Graphs:
•Check the horizontal and vertical scales
•Examine the location of the data distribution
•Examine the shape of the distribution
•Look for any unusual outlier.
Relative Frequency Histograms

A relative frequency histogram for a
quantitative data set is a bar graph in which
the height of the bar shows “how often”
(measured as a proportion or relative
frequency) measurements fall in a particular
class or subinterval.
Create intervals
Stack and draw bars
Relative Frequency Histograms
Example
The ages of 50 tenured faculty at a
state university.




34
42
34
43
48
31
59
50
70
36
34
30
63
48
66
43
52
43
40
32
52
26
59
44
35
58
36
58
50 37 43 53 43 52 44
62 49 34 48 53 39 45
41 35 36 62 34 38 28
53
•
•
•
•
We choose to use 6 intervals.
Minimum class width = (70 – 26)/6 = 7.33
Convenient class width = 8
Use 6 classes of length 8, starting at 25.
Age
Tally
Frequency
Relative
Frequency
Percent
25 to < 33
1111
5
5/50 = .10
10%
33 to < 41
1111 1111 1111
14
14/50 = .28
28%
41 to < 49
1111 1111 111
13
13/50 = .26
26%
49 to < 57
1111 1111
9
9/50 = .18
18%
57 to < 65
1111 11
7
7/50 = .14
14%
65 to < 73
11
2
2/50 = .04
4%
14/50
Relative frequency
12/50
10/50
8/50
6/50
4/50
2/50
0
25
33
41
49
Ages
57
65
73
Relative Frequency Histograms




Divide the range of the data into 5-12
subintervals of equal length.
Calculate the minimum width of the
subinterval as Range/Number.
Round the minimum width up to a convenient
value.
Use the method of left inclusion,including the
left endpoint, but not the right in your tally.


Create a statistical table including the
subintervals, their frequencies and relative
frequencies.
Draw the relative frequency histogram,
plotting the subintervals on the horizontal axis
and the relative frequencies on the vertical axis.
14/50
Relative frequency
12/50
10/50
8/50
6/50
4/50
2/50
0
25
33
41
49
57
65
73
Ages
The height of the bar represents
 The proportion of measurements falling in
that class or subinterval.
 The probability that a single measurement,
drawn randomly from the set, will belong to
that class or subinterval.
Shape?
Skewed right
14/50
12/50
Relative frequency
Describing the
Distribution
10/50
8/50
6/50
4/50
2/50
Outliers?
No.
0
25
33
41
49
57
65
73
Ages
What proportion of the
tenured faculty are
younger than 41?
(14 + 5)/50 = 19/50 = .38
What is the probability
that a randomly selected (9 + 7 + 2)/50 = 18/50 = .36
faculty member is 49 or
older?
Chapter review
I. How Data Are Generated
• Experimental units, variables, measurements
• Samples and populations
• Univariate, bivariate, and multivariate data
II. Types of Variables
• Qualitative or Categorical
• Quantitative
a. Discrete
b. Continuous
III. Graphs for Univariate Data
Distributions
1. Qualitative or categorical data
a. Pie charts
b. Bar charts
2. Quantitative data
a. Pie and bar charts
b. Line charts
c. Dot plots
d. Stem and leaf plots
e. Relative frequency histograms
3. Describing data distributions
• Shapes — symmetric, skewed left, skewed
right, unimodal, bimodal
• Proportion of measurements in certain intervals
• Outliers
Example
A Manufacturer of jeans has plants in CA,
AZ and TX. A randomly selected 25 pairs of
jeans shows their plants as follows
CA
CA
AZ
CA
CA
AZ
CA
AZ
AZ
AZ
AZ
TX
CA
TX
AZ
TX
TX
AZ
TX
CA
CA
TX
TX
TX
CA
What is the experimental unit? Pair of jeans
What is the variable?
State
Is it qualitative or quantitative?
Qualitative
Construct a pie chart
Construct a statistical table
State
Frequency
Relative Frequency Sector Angle
CA
9
.36
129.6
AZ
8
.32
115.2
TX
8
.32
115.2
Construct a bar chart to describe the data
State
Frequency
Relative Frequency Sector Angle
CA
9
.36
129.6
AZ
8
.32
115.2
TX
8
.32
115.2
What proportion of the jeans are made
in TX?
8/25=32%
What state produces the most jeans in
the group?
California
Example
The age (in
months) at which
50 children were
enrolled in a
preschool are listed
38
47
32
55
42
40
36
35
45
45
40
35
34
39
50
48
41
40
42
38
30
34
41
33
37
36
43
30
41
46
35
43
30
32
39
31
48
46
36
36
39
41
46
32
33
36
40
37
50
31
Construct a stem and leaf to
display the data
Use the tens digit as the stem, and the
ones digit as the leaf, dividing each stem
into two parts.
3
042403223101
3
859597966657686
4
031120130021
4
76886556
5
00
5
5
Reorder
3
0 0 0 11 2 2 2 3 3 4 4
3
555666667788999
4
000011112233
4
55666788
5
00
5
5
555666667788999
000011112233
55666788
00
5
3
4
4
5
5
Unimodal
0 0 0 11 2 2 2 3 3 4 4
Rotate 90
degree
counterclockwise
3
What is the shape of the measurements?
Construct a relative frequency histogram.
Start the lower boundary of the first class
at 30 and use a class width of 5.
Class
Boundary
Frequency
Relative Freq.
1
30 to < 35
12
0.24
2
35 to < 40
15
0.30
3
40 to < 45
12
0.24
4
45 to < 50
8
0.16
5
50 to < 55
2
0.04
6
55 to < 60
1
0.02
What proportion of the children were 35
month or older, but less than 45 months
of age?
(15+12)/50=0.54
If one child is selected at random, what is
probability that the child was less than 50
months?
(12+15+12+8)/50=0.94
Example
The value of a quantitative variable is
measured once a year for ten year period.
year
1
2
3
4
5
Measur.
61.5
62.3
60.7
59.8
58.0
year
6
7
8
9
10
Measur.
58.2
57.5
57.5
56.1
56.0
Create a line chart to describe the
variable as it changes over time.
Describle the measurements using
the line chart.
Observing the change in y as x increases,
we see that the measurements are
decreasing over time.