Variables_ Graphs_ and Distribution Shapes Notes

Download Report

Transcript Variables_ Graphs_ and Distribution Shapes Notes

Variables,
Graphs and
Distribution
Shapes
Data Analysis
Statistics is the science of data.
Data Analysis is the process of organizing, displaying, summarizing,
and asking questions about data.
Individuals
 objects described by a set of data
•
People, Animals, or Things
Variable
 any characteristic of an individual
Categorical Variable
 places an individual into
one of several groups or
categories.
Not every variable that
takes number values is
quantitative!
Ex: zip code
Why would we want to
find an average zip code?
Quantitative Variable
 takes numerical values for
which it makes sense to find
an average.
Categorical Variables
Frequency Table
Format
Variable
Count of Stations
Format
Percent of Stations
Adult Contemporary
1556
Adult Contemporary
Adult Standards
1196
Adult Standards
8.6
Contemporary Hit
4.1
Contemporary Hit
569
11.2
Country
2066
Country
14.9
News/Talk
2179
News/Talk
15.7
Oldies
1060
Oldies
Religious
2014
Religious
Rock
869
Spanish Language
750
Other Formats
Values
Relative Frequency Table
Total
1579
13838
14.6
6.3
Count
Spanish Language
Total
 Relative
Frequency
means %
7.7
Rock
Other Formats
 Frequency
means
count
Percent
5.4
11.4
99.9
Due to
roundoff
error
Displaying Categorical Data
Frequency tables can be difficult to read.
Sometimes it is easier to analyze a distribution by displaying it with a
bar graph or pie chart.
Percent of Stations
Count of Stations
Frequency Table
Format
Adult Contemporary
Count of Stations
Adult Contemporary
11%
11%
Adult Standards
5%
6%
Contemporary Hit
1556
Adult
Standards
1196
Contemporary hit
569
9%
Country
2066
Country
News/Talk
4%
Oldies
15%
Religious
Rock
Spanish Language
8%
16%
Other Formats
Total
News/Talk
2179
Relative Frequency Table
2500
Adult Contemporary
2000
1500
1000
1060
Oldies
15%
Format
500
Rock
750
0
11.2
Adult Standards
8.6
Contemporary Hit
4.1
Country
14.9
News/Talk
15.7
Oldies
2014
Religious
869
Percent of Stations
Religious
7.7
14.6
Rock
6.3
Spanish Language
5.4
Spanish
1579
Other Formats
11.4
13838
Other
Total
99.9
Displaying Quantitative Data
Useful graphs include: a line plot, a histogram, a stem and leaf plot,
and a box-and-whisker plot.
Distributions
“Raw” data values are simply presented in an unorganized list.
Organizing the data values by using the frequency with which
they occur results in a distribution of the data. A distribution
may be presented as a frequency table or as a data display. Data
displays reveal the shape of a distribution.
The table gives data about a random sample of 20 babies born at a hospital.
Seeing the Shape of a Distribution
• As you just saw, data distributions can have various shapes.
Some of these shapes are given names in statistics.
• A distribution whose shape is basically level (that is, it looks like a
rectangle) is called a uniform distribution
• A distribution that is mounded in the middle with symmetric “tails”
at each end (that is, it looks bell-shaped) is called a normal
distribution
• A distribution that is mounded but not symmetric because one
“tail” is much longer than the other is called a skewed distribution.
When the longer “tail” is on the left, the distribution is said to be
skewed left. When the longer “tail” is on the right, the distribution
is said to be skewed right.
Real World Video
• Module 23
Stem and Leaf Plot
• A Stem and Leaf Plot is a special table where each
data value is split into a "stem" (the first digit or
digits) and a "leaf" (usually the last digit).
The "stem" values are listed down, and the
• Ex:
"leaf" values go right (or left) from the stem
values.
The "stem" is used to group the scores and
each "leaf" shows the individual scores
within each group.
It is OK to repeat a leaf value.
Leaves are typically arranged in increasing
order.
If we turn the plot on its side, we can see
the distribution of data.
Stem and Leaf Discussion
• ..\..\Probability and Statistics\6th DayGraphs\Stem and Leaf Discussion.pdf
Box-and-Whisker Plot
• Statistics assumes that your data points
(the numbers in your list) are clustered
around some central value. The "box" in
the box-and-whisker plot contains, and
thereby highlights, the middle half of
these data points.
• Steps:
1. Order your data in numerical order
2. Find the median of your data (divides
the data into two halves)
• If you have an even number of data,
average the 2 middle values
3. Find the median of those two halves
(the Upper Quartile and Lower
Quartile)
• Don’t re-use the median value
4. Find the maximum and minimum value
5. Draw the box-and-whisker plot
Note: The median, upper
quartile and lower quartiles
divide the entire data set into
quarters, called "quartiles".
Q1- lower quartile
Q2- median
Q3- upper quartile
Ex: Draw a box-and-whisker plot for the following data set:
4.3, 5.1, 3.9, 4.5, 4.4, 4.9, 5.0, 4.7, 4.1, 4.6, 4.4, 4.3,
4.8, 4.4, 4.2, 4.5, 4.4
My first step is to order the set. This gives me:
3.9, 4.1, 4.2, 4.3, 4.3, 4.4, 4.4, 4.4, 4.4, 4.5, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1
The first number I need is the median of the entire set.
3.9, 4.1, 4.2, 4.3, 4.3, 4.4, 4.4, 4.4, 4.4, 4.5, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1
The median is Q2 = 4.4.
The next two numbers I need are the medians of the two halves. Since I used the "4.4" in the middle
of the list, I can't re-use it, so my two remaining data sets are:
3.9, 4.1, 4.2, 4.3, 4.3, 4.4, 4.4, 4.4
and
4.5, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1
The first half has eight values, so the median is the average of the middle two: Q1 =
The median of the second half is: Q3 =
4.7+4.8
2
4.3+4.3
2
Continued…
= 4.75
= 4.3
Ex (Cont.):
Now I'll mark off the minimum and
maximum values, and Q1, Q2,
and Q3:
The "box" part of the plot goes
from Q1 to Q3:
And then the "whiskers" are drawn
to the endpoints:
Ex: Draw the box-and-whisker plot for the
following data set:
77, 79, 80, 86, 87, 87, 94, 99
My first step is to find the median. Since there are eight data points, the median
will be the average of the two middle values:
86 + 87
= 86.5 = 𝑄2
2
This splits the list into two halves: 77, 79, 80, 86 and 87, 87, 94, 99. Since
the halves of the data set each contain an even number of values, the submedians will be the average of the middle two values.
𝑄1 =
79 + 80
= 79.5
2
𝑄3 =
87 + 94
= 90.5
2
The minimum value is 77 and the maximum value is 99, so I have:
Min: 77, Q1: 79.5, Q2: 86.5, Q3: 90.5, Max: 99
Notes:
•
•
•
•
𝑄1 : 25% of data below
𝑄2 : 50% of data below
𝑄3 : 75% of data below
The distance between 𝑸𝟏 and 𝑸𝟑 is
called the Interquartile Range.
Box-and-Whisker Discussion
• ..\..\Probability and Statistics\6th Day- Graphs\Boxand-Whisker Discussion.pdf