Frequency Distribution and Variation
Download
Report
Transcript Frequency Distribution and Variation
Frequency Distribution and
Variation
Prepared by E.G. Gascon
Frequency Distributions
Frequency distribution: Quantitative Data is a table that shows
classes or intervals (frequency f of a class is the number of data
entries in the class
Lower class limit = least number that can belong to the class
Upper class limit = greatest number that can belong to the class
Class width = distance between lower (or upper) limits of
consecutive classes. (Not- lower-upper within a class)
Range – difference between the maximum and minimum data
Class boundaries- are the numbers that separate classes without
forming gaps between them
Constructing a Frequency
Distribution
Decide on the number of classes (could be
arbitrary)
Find the range= highest value – lowest value
Find the class width = Divide the range by
number of classes (round up to next whole
number if decimal)
Decide the class limits
Tally
Count tally to find frequency
Total frequency
Creating a Histogram in Excel
There are several ways
depending upon the version
Household Income Example
Midpoi Frequency in
nt
Thousands
2500
814
7500
1389
12500
1268
20000
2203
30000
1722
42500
2243
62500
2030
87500
868
Frequency in Thousands
2500
Frequency
Enter the data
(Midpoint as text by
writing each with a ‘
in front ex: ‘250
Select the Select the
data and create a
column
2000
1500
1000
500
0
2500
7500
12500
20000
30000
Midpoint Income
42500
62500
87500
Creating a Histogram in Excel-p2
Make the bars touch by changing the “gap
width= 0” Right click on the bars and select
“Format Data Series”
Frequency in Thousands
2500
Frequency
2000
1500
1000
500
0
2500
7500
12500
20000
30000
Midpoint Income
42500
62500
87500
Measures of Central Tendency
Mean: sum of the data divided by number of entries
Median: Middle of data when the data set is ordered.
Affected by outliers (values which are a distance from the majority of
entries
If the data set has an odd number of entries median is the middle data
entry.
If the data set is even number of entries, the median is the mean of the two
middle entries.
Mode: is the data entry that occurs with the greatest frequency.
If no entry is repeated, the data set has no mode.
If two entries occur with the same greatest frequency, each entry is a mode
and the data set is called bimodal.
The mode is the only measure of that is used to describe data non-numeric
data, when working with quantitative data, it is rarely used.
Measures of Variation
Range: is the difference between the maximum and
minimum data entries in the set.
Deviation: of an entry x, in a population data set is the
difference between the entry and the mean of the data set
Variance is the average of the sums of all the deviations.
(not easily calculated in a large sample so….
Sample
variance:
Sample
Standard
Deviation:
s2
s
( x x)
2
n 1
2
(
x
x
)
n 1
Interpretation of the Standard
Deviation
The size of the standard deviation tells up something
about how spread out the data are from the mean.
~68% of the data lie within 1 standard deviation of the mean
(1 times the size of the SD on either side of the mean)
~95% of the data lie within 2 standard deviation of the mean
(2 times the size of the SD on either side of the mean)
~99.7% of the data lie within 3 standard deviation of the mean
(3 times the size of the SD on either side of the mean)
Standard Score, (z-score) represents the number of
standard deviations a given value x falls from the
mean .