2 Describing Data- Frequency Tables, Frequency

Download Report

Transcript 2 Describing Data- Frequency Tables, Frequency

Describing Data:
Frequency Tables, Frequency
Distributions, and Graphic Presentation
GOALS
1. Organize qualitative data into a frequency
table.
2. Present a frequency table as a bar chart or a
pie chart.
3. Organize quantitative data into a frequency
distribution.
4. Present a frequency distribution for quantitative
data using histograms, frequency polygons,
and cumulative frequency polygons.
Why we describe data

Descriptive statistics organize data to show the
general shape of the data and where values tend to
concentrate and to expose extreme or unusual data
values.

Remember Quantitative data ≠ Qualitative data
Frequency Table
Relative Class Frequencies


Class frequencies can be converted to relative class
frequencies to show the fraction of the total number
of observations in each class.
A relative frequency captures the relationship between
a class total and the total number of observations.
Bar Charts
Pie Charts
How to construct charts with SPSS


Each group uses “employee data.sav”
Make bar charts/pie charts from
–
–
–
–
Gender,
Employment category
Educational level
Minority

Graphs bar and/or pie

Describe the output (Charts)
Frequency Distribution
A Frequency
distribution is a
grouping of data into
mutually exclusive
categories showing
the number of
observations in each
class.
EXAMPLE – Constructing
Frequency Distributions:
Quantitative Data
Ms. Kathryn Ball of
AutoUSA wants to
develop tables, charts,
and graphs to show the
typical selling price on
various dealer lots. The
table on the right reports
only the price of the 80
vehicles sold last month
at Whitner Autoplex.
Constructing a Frequency Table Example

Step 1: Decide on the number of classes.
A useful recipe to determine the number of classes (k) is
the “2 to the k rule.” such that 2k > n.
There were 80 vehicles sold. So n = 80. If we try k = 6, which
means we would use 6 classes, then 26 = 64, somewhat less
than 80. Hence, 6 is not enough classes. If we let k = 7, then 27
128, which is greater than 80. So the recommended number of
classes is 7.

Step 2: Determine the class interval or width.
The formula is: i  (H-L)/k where i is the class interval, H is
the highest observed value, L is the lowest observed value,
and k is the number of classes.
($35,925 - $15,546)/7 = $2,911
Round up to some convenient number, such as a multiple of 10
or 100. Use a class width of $3,000
Constructing a Frequency Table Example

Step 3: Set the individual class limits
Constructing a Frequency Table

Step 4: Tally the
vehicle selling prices
into the classes.

Step 5: Count the
number of items in
each class.
Practice

There were 200 tables sold. The lowest value was 10,000 baht
and the highest value was 100,000 baht (use 2 to the k rule)

Construct the class limits
Class Intervals and Midpoints
Class midpoint: A point that divides a class
into two equal parts. This is the average
of the upper and lower class limits.
Class frequency: The number of
observations in each class.
Class interval: The class interval is
obtained by subtracting the lower limit of
a class from the lower limit of the next
class.
Class Intervals and Midpoints Example
Referring to the AutoUSA example

Class midpoint: For the first class the lower class
limit is $15,000 and the next limit is $18,000. The class
midpoint is $16,500, found by:
($15,000 + $18,000)/2

Class interval:
The class interval of the vehicle
selling price data is $3,000. It is found by subtracting the
lower limit of the first class, $15,000, from the lower limit of
the next class:
($18,000 - $15,000)
Relative Frequency Distribution
To convert a frequency distribution to a relative frequency
distribution, each of the class frequencies is divided by the
total number of observations.
Graphic Presentation of a
Frequency Distribution
The three commonly used graphic forms
are:
 Histograms
 Frequency
polygons
 Cumulative frequency distributions
Histogram (For Quantitative data)
Histogram
•A frequency distribution based on quantitative data
•A graph in which the classes are marked on the horizontal axis
and the class frequencies on the vertical axis.
• The class frequencies are represented by the heights of the
bars and the bars are drawn adjacent to each other.
SPSS guide


Use “employee data.sav”
Graphs  histogram
–
–

Variable : Current salary
Panel by row : Gender
Describe the output
Frequency Polygon

A frequency polygon
also shows the shape
of a distribution and is
similar to a histogram.

It consists of line
segments connecting
the points formed by
the intersections of the
class midpoints and the
class frequencies.
Cumulative Frequency Distribution
Cumulative Frequency Distribution
Class Practice

Show how to apply descriptive statistics that we
study today
–
–
–

frequency table
 (AnalyzeDescriptive statisticsfrequencies)
bar chart or a pie chart (Graphsbar/pie).
Histograms (GraphsHistogram)
Interpret the results in one paragraph for each table
or chart
The End
Describing Data:
Numerical Measures
GOALS
1. Calculate the arithmetic mean, weighted mean,
median, mode, and geometric mean.
2. Explain the characteristics, uses, advantages, and
disadvantages of each measure of location.
3. Identify the position of the mean, median, and mode
for both symmetric and skewed distributions.
4. Compute and interpret the range, mean deviation,
variance, and standard deviation.
5. Understand the characteristics, uses, advantages, and
disadvantages of each measure of dispersion.
Numerical Descriptive Measures
Measures of Location
 Arithmetic Mean
 Weighted Mean
 Median
 Mode
 Geometric Mean
Measures of Dispersion
 Range
 Mean Deviation
 Variance
 Standard Deviation
Population Mean
For ungrouped data, the population mean is the
sum of all the population values divided by the
total number of population values:
EXAMPLE – Population Mean
Sample Mean

For ungrouped data, the sample mean
is the sum of all the sample values
divided by the number of sample
values:
EXAMPLE – Sample Mean
Properties of the Arithmetic Mean
1.
2.
3.
4.
Every set of interval-level and ratio-level data has a
mean.
All the values are included in computing the mean.
The mean is unique.
The sum of the deviations of each value from the
mean is zero.
Weighted Mean

The weighted mean of a set of numbers X1,
X2, ..., Xn, with corresponding weights w1,
w2, ...,wn, is computed from the following
formula:
EXAMPLE – Weighted Mean
The Carter Construction Company pays its hourly
employees $16.50, $19.00, or $25.00 per hour.
There are 26 hourly employees, 14 of which are paid
at the $16.50 rate, 10 at the $19.00 rate, and 2 at the
$25.00 rate. What is the mean hourly rate paid the
26 employees?
The Median
The Median is the midpoint of the values
after they have been ordered from the
smallest to the largest.


There are as many values above the median as below
it in the data array.
For an even set of values, the median will be the
arithmetic average of the two middle numbers.
Properties of the Median
1.
2.
3.
4.
There is a unique median for each data set.
It is not affected by extremely large or small
values and is therefore a valuable measure of
central tendency when such values occur.
It can be computed for ratio-level, intervallevel, and ordinal-level data.
It can be computed for an open-ended
frequency distribution if the median does not
lie in an open-ended class.
EXAMPLES - Median
The ages for a sample of five
college students are:
21, 25, 19, 20, 22
The heights of four basketball
players, in inches, are:
76, 73, 80, 75
Arranging the data in ascending
order gives:
Arranging the data in ascending
order gives:
19, 20, 21, 22, 25.
Thus the median is 21.
73, 75, 76, 80.
Thus the median is 75.5
The Mode

The mode is the value of the observation
that appears most frequently.
Example - Mode
The Relative Positions of the Mean,
Median and the Mode
Skewness



Measures of central location for a set of
observations (the mean, median, and mode) and
measures of data dispersion (e.g. range and the
standard deviation) were introduced
Another characteristic of a set of data is the shape.
There are four shapes commonly observed:
1.
2.
3.
4.
4-42
symmetric,
positively skewed,
negatively skewed,
bimodal.
Skewness - Formulas for Computing
The coefficient of skewness can range from -3 up to 3.
–
–
–
4-43
A value near -3, such as -2.57, indicates considerable
negative skewness.
A value such as 1.63 indicates moderate positive skewness.
A value of 0, which will occur when the mean and median
are equal, indicates the distribution is symmetrical and that
there is no skewness present.
Commonly Observed Shapes
4-44
Skewness – SPSS example
•From ‘employee data.sav’
•Calculate skewness, mean, median, maximum and
minimum
•Analyzedescriptive statistics frequencies
statistics
•Select “beginning salary and current salary”, then
discuss, which one is more skewed?
4-45
The Geometric Mean





Useful in finding the average change of percentages,
ratios, indexes, or growth rates over time.
Has a wide application in business and economics
because we are often interested in finding the percentage
changes in sales, salaries, or economic figures, such as
the GDP, which compound or build on each other.
Will always be less than or equal to the arithmetic mean.
Defined as the nth root of the product of n values.
The formula for the geometric mean is written:
EXAMPLE – Geometric Mean
The return on investment earned by Atkins
construction Company for four successive years
was: 30 percent, 20 percent, -40 percent, and
200 percent. What is the geometric mean rate of
return on investment?
GM  4 ( 1.3 )( 1.2 )( 0.6 )( 3.0 )  4 2.808  1.294
Dispersion
Why Study Dispersion?
 A measure of location, such as the mean or the median,
only describes the center of the data, but it does not tell
us anything about the spread of the data.
 For example, if your nature guide told you that the river
ahead averaged 3 feet in depth, would you want to
wade across on foot without additional information?
Probably not. You would want to know something about
the variation in the depth.
 A second reason for studying the dispersion in a set of
data is to compare the spread in two or more
distributions.
Samples of Dispersions
Measures of Dispersion

Range

Mean Deviation

Variance and Standard
Deviation
EXAMPLE – Range
The number of cappuccinos sold at the Starbucks location in the
Orange Country Airport between 4 and 7 p.m. for a sample of 5
days last year were 20, 40, 50, 60, and 80. Determine the range
for the number of cappuccinos sold.
Range = Largest – Smallest value
= 80 – 20 = 60
EXAMPLE – Mean Deviation
The number of cappuccinos sold at the Starbucks
location in the Orange Country Airport between 4
and 7 p.m. for a sample of 5 days last year were
20, 40, 50, 60, and 80. Determine the mean
deviation for the number of cappuccinos sold.
EXAMPLE – Variance and Standard
Deviation
The number of traffic citations issued during the last five months in
Beaufort County, South Carolina, are 38, 26, 13, 41, and 22. What
is the population variance?
EXAMPLE – Sample Variance
The hourly wages for
a sample of parttime employees at
Home Depot are:
$12, $20, $16, $18,
and $19. What is
the sample
variance?
Describing Data:
Displaying and Exploring Data
GOALS
1.
2.
3.
4.
5.
4-56
Develop and interpret a dot plot.
Construct and interpret box plots.
Compute and understand the coefficient of
skewness.
Draw and interpret a scatter diagram.
Construct and interpret a contingency table.
Dot Plots




4-57
A dot plot groups the data as little as possible and
the identity of an individual observation is not lost.
To develop a dot plot, each observation is simply
displayed as a dot along a horizontal number line
indicating the possible values of the data.
If there are identical observations or the
observations are too close to be shown individually,
the dots are “piled” on top of each other.
Dot plots are most useful for smaller data sets,
whereas histograms tend to be most useful for large
data sets.
Dot Plot – SPSS Example
•Use employee data.sav
•GraphsScatter/Dot…
•Simple Dot
•Define
•(From employee data.sav, select “current salary”
for X-Axis variable
4-58
Boxplot


4-59
In descriptive statistics, a box plot or boxplot is a
convenient way of graphically depicting groups of
numerical data through their five-number summaries:
the smallest observation (sample minimum), lower
quartile (Q1), median (Q2), upper quartile (Q3), and
largest observation (sample maximum).
A boxplot may also indicate which observations, if any,
might be considered outliers.
Boxplot - Example
4-60
Boxplot Example
4-61
Describing Relationship between Two
Variables



4-62
One graphical technique we use
to show the relationship between
variables is called a scatter
diagram.
To draw a scatter diagram we
need two variables.
We scale one variable along the
horizontal axis (X-axis) of a
graph and the other variable
along the vertical axis (Y-axis).
Describing Relationship between Two
Variables – Scatter Diagram Examples
4-63
Scatter Diagram - SPSS
•From employee data.sav
•GraphsScatter/Dot…
•Simple scatter
•Define
• employee data.sav,
•select “current salary” for Y-Axis variable
•select “months since hired” for X-Axis variable
4-64
Contingency Tables


4-65
A scatter diagram requires that both of the variables
be at least interval scale.
What if we wish to study the relationship between
two variables when one or both are nominal or
ordinal scale? In this case we tally the results in a
contingency table.
Contingency Tables – An Example
A manufacturer of preassembled windows produced 50 windows
yesterday. This morning the quality assurance inspector reviewed
each window for all quality aspects. Each was classified as
acceptable or unacceptable and by the shift on which it was
produced. The two variables are shift and quality. The results are
reported in the following table.
4-66
Contingency Tables – An Example
Usefulness of the Contingency Table:
By organizing the information into a contingency table we can
compare the quality on the three shifts.
For example, on the day shift, 3 out of 20 windows or 15 percent are
defective. On the afternoon shift, 2 of 15 or 13 percent are
defective and on the night shift 1 out of 15 or 7 percent are
defective.
Overall 12 percent of the windows are defective. Observe also that
40 percent of the windows are produced on the day shift, found by
(20/50)(100).
4-67
Contingency table- SPSS
•From employee data.sav
•AnalyzeDescriptive Statistics Crosstabs…
•Row- Gender
•Column- Employment Category
• In “Cells”
•Percentages: Row
4-68
The End
4-69