Statistics in Applied Science and Technology
Download
Report
Transcript Statistics in Applied Science and Technology
Statistics in Applied Science and
Technology
Chapter 4 Summarizing Data
July, 2000
Guang Jin
Key Concepts in This Chapter
Mean
Median
Mode
Range
Standard Deviation
Variance
Coefficient of Variation
July, 2000
Guang Jin
Measures of Central Tendency
Central tendency - the tendency of a set
of data to center around certain values.
The three most common values are the
mean, the median, and the mode.
July, 2000
Guang Jin
The Mean
The arithmetic mean (or simply, mean) is
computed by summing all the observations in
the sample and dividing the sum by the
number of observations.
Symbolically, the mean x
n
x
July, 2000
x
i 1
n
i
x1 is the first and xi is the
ith in a series of observations.
n is the total number of
observations
Guang Jin
The Mean (Continued)
The arithmetic mean may be considered the
balance point, or fulcrum, in a distribution.
The arithmetic mean is the point that
balances the positive and negative
deviations from the fulcrum.
The mean is affected by values of each
observations of the distribution and may be
distorted when extreme values exist.
July, 2000
Guang Jin
The Median
Median is defined as the middle value when
observations are ordered.
Median is the value above which there are
the same number of observations as below.
For an even number of observations, the
median is the average of the two
middlemost values.
July, 2000
Guang Jin
The Mode
The mode is the observation that occurs
most frequently.
Mode can be read from a graph as that value
on the horizontal axis that corresponds to
the peak of the distribution.
July, 2000
Guang Jin
Which Average Should You Use
for Quantitative Data?
When a distribution of observation is normal or
not too skewed, the values of the mode, the
median and the mean are same or similar, and any
of them can be used to describe central tendency.
When a distribution is skewed, appreciable
difference between the values of mean and
median, therefore both the mean and median
should be reported.
July, 2000
Guang Jin
Measures of central tendency for
Qualitative Data
The mode always can be used with
qualitative data
Median can be used whenever the
qualitative data is ordinal
Mean is not appropriate for qualitative
data
July, 2000
Guang Jin
Measures of Variation
Measure of variation (or variability) is
important to know whether observations
tend to be quite similar (homogeneous) or
whether they vary considerably
(heterogeneous).
Three most common measures of variation
include the range, the standard deviation,
and the variance.
July, 2000
Guang Jin
Range
The range is defined as the difference in
value between the highest (maximum) and
lowest (minimum) observation:
Range = X max - X min
July, 2000
Guang Jin
Standard Deviation and Variance
By far the most widely used measure of variation
is the standard deviation, represented by symbol
s.
Standard deviation is the square root of the
variance (represented by symbol s2) of the
observation.
The larger the standard deviation and variance, the
more heterogeneous the distribution.
July, 2000
Guang Jin
Variance
The variance (s2) is computed by squaring
each deviation from the mean, adding them
up, and dividing their sum by one less than
n, the sample size:
n
s2
July, 2000
(x x)
i 1
2
i
n 1
Guang Jin
Standard Deviation
The standard deviation (s, sometimes
represented by SD) is computed by
extracting the square root of the variance:
s s
2
The units of the standard deviation is the
same as the unites of raw data.
July, 2000
Guang Jin
Important Generalizations
For most frequency distributions, a majority
(often as many as 68%) of all observations
are within one standard deviation on either
side of the mean.
For most frequency distributions, a small
minority (often as many as 5%) of all
observations deviate more than two standard
deviations on either side of the mean.
July, 2000
Guang Jin
Variability for Qualitative Data
For qualitative data can not be ordered,
measures of variability are nonexistent.
For qualitative data can be ordered, it is
appropriate to describe variability by
identifying extreme observations.
July, 2000
Guang Jin
Coefficient of Variation
Coefficient of variation (represented by CV) is
defined as the ratio of the standard deviation to the
absolute value of the mean, expressed as a
percentage:
CV depicts the size of the standard deviation
relative to its mean and can be used to compare
the relative variation of even unrelated quantities.
July, 2000
Guang Jin
Equations for Population and Sample
Means and Standard Deviation
n
x
Mean
x
Variance
s
July, 2000
N
i
i 1
n
(x x)
i 1
2
i
n 1
s s
x
i 1
i
N
N
n
2
Standard
deviation
Population
Sample
Quantity
2
Guang Jin
2
2
(
x
)
i
i 1
N
2