Transcript STAT 113

Chapter 2.3 & 2.4:
Stem and Leaf Plots and Cross Tabulation
Chris Morgan, MATH G160
[email protected]
April 11, 2012
Lecture 29
1
2
Stem and Leaf Plots
• Gives a quick picture of the shape of the distribution
• Shows the rank order and the distribution simultaneously
• Includes actual numerical values
• Works best for small numbers of observations where all
observations are greater than zero
3
Making a Stem and Leaf Plot
• Sort data from smallest to largest (and trim data if necessary)
• For each number set the last part to its “leaf” and the first part
as its “stem” (eg. With the number 24 the 2 would be
the stem and the 4 would be the leaf)
• Separate the stems and leafs into two columns; and format the
leaves such that they are left-aligned
4
Making a Stem and Leaf Plot
Number of times per day my mom will yell at me over Christmas break:
0
11
22
6
17
0
15
21
18
9
8
4
4
8
13
13
5
2
3
6
10
24
19
6
5
Making a Stem and Leaf Plot
Number of times per day my mom will yell at me over Christmas break:
0
1
2
6
Making a Stem and Leaf Plot
Number of times per day my mom will yell at me over Christmas break:
0 0, 0, 2, 3, 4, 4, 5, 6, 6, 6, 8, 8, 9
1 0, 1, 3, 3, 5, 7, 8, 9
2 1, 2, 4
7
Histogram or Stem and Leaf Plot?
Histogram
Stem and Leaf Plots
• Quantitative variables
• Quantitative variables
• Good for big data sets, especially
if technology is available
• Good for small data sets,
convenient for back-of-the-envelope
calculations; rarely found in
scientific or laymen publications
• Uses a box to represent each
data point
• Popular method of conveying
information and will be utilized
often in this course
• Uses a digit to represent each data
point
• Seen as elementary and will not be
utilized often in this course
8
How to analyze relationships between types of data?
What type(s) of variables we have will determine the method we
use to compare the data.
Types of Variables
Categorical vs. Categorical Cross
Categorical vs. Quantitative
Quantitative vs. Quantitative
Method
Tabulation
ANOVA
Regression
9
Cross Tabulation
Yellow
Red
Orange
Green
Blue
Brown
Peanut
5
3
2
5
7
3
Plain
6
4
8
11
9
2
10
Cross Tabulation
Yellow
Red
Orange
Green
Blue
Brown
Total
Peanut
5
3
2
5
7
3
25
Plain
6
4
8
11
9
2
40
Total
11
7
10
16
16
5
65
11




Two way tables make it easy to compute
conditional probability!
P(Row A | Column B)=
Similarly,
P(Column B | Row A)=
Cell(A,B)
Column B Total
.
Cell(A,B)
Row A Total
.
Cross Tabulation
Yellow
Red
Orange
Green
Blue
Brown
Total
Peanut
5
3
2
5
7
3
25
Plain
6
4
8
11
9
2
40
Total
11
7
10
16
16
5
65
What more can I do what cross tabulation?
• Joint Probability
• Marginal Probability
• Condition Probability
13
Joint Probability
Yellow
Red
Orange
Green
Blue
Brown
Total
Peanut
0.08
0.05
0.03
0.08
0.11
0.05
25
Plain
0.09
0.06
0.12
0.17
0.14
0.03
40
Total
11
7
10
16
16
5
65
Probability (A and B) = cell count of A and B / grand total
P(Red and Peanut) = 3 / 65 = 0.05
P(Blue and Plain) = 9 / 65 = 0.14
14
Marginal Probability
Yellow
Red
Orange
Green
Blue
Brown
Total
Peanut
5
3
2
5
7
3
0.38
Plain
6
4
8
11
9
2
0.62
Total
0.17
0.11
0.15
0.25
0.25
0.08
1.0
Probability [column A] = total of column A / grand total
P(Orange) = 10 / 65 = 0.15
Probability [row B] = total of row B / grand total
P(Plain) = 40 / 65 = 0.62
15
Conditional Probability
Yellow
Red
Orange
Green
Blue
Brown
Total
Peanut
5
3
2
5
7
3
25
Plain
6
4
8
11
9
2
40
Total
11
7
10
16
16
5
65
Conditional Probability is the probability event A occurs given that
event B has already occurred. For instance, if I observe a yellow
M&M then what is the probability it is plain. Or if it is peanut,
what is the probability it’s red?
Probability [A and B | B] = cell count of A and B / total count of B
P(Plain | Yellow) = 6 / 11 = 0.55
P(Red | Peanut) = 3 / 25 = 0.12
16