Central tendency, dispersion
Download
Report
Transcript Central tendency, dispersion
Tools for Science
Observation
Measurement
Hypothesis generation
Hypothesis testing
Must know “what” something is before you can
ask questions about “how” or “why” something
happens
Descriptive Statistics
Measures of Central Tendency
Describing the middle of a distribution
Arithmetic mean
X=
sample mean =
‘X bar’
Xi
n
sum of values of all observations
total number of observations
Sample mean provides an estimate of the population
mean, .
Excel demo
Median = the middle measurement in a ranked list of
values, half the values are below the median and half
are above. (same as mean for symmetric distribution)
Mode = the most frequent value (most fashionable)
unimodal
median
mean
mode
frequency of occurrence
Symmetric
frequency of occurrence
How measures of central tendency relate to each other
positive
mode
median
mean
IQ of college professors
mode
mode
mean median
IQ of college professors
frequency of occurrence
Skewed
frequency of occurrence
IQ of college professors
bimodal
negative
mode
mean
median
IQ of college professors
Geometric mean
Geometric mean used when quantities multiply together to produce a product
Ex. An investment earns 10% yr 1, 50% yr 2, and 30% yr 3. What is its
average rate of return?
Not the arithmetic mean, because in year 1 your investment was multiplied
(not added to) by 1.10, etc……
Ask "by what constant factor would your investment need to be multiplied by
each year in order to achieve the same effect as multiplying by 1.10 one year,
1.50 the next, and 1.30 the third?"
The answer is the geometric mean
.
In this case geometric mean=1.283, with average rate of return ~ 28% (not
30% which is what the arithmetic mean of 10%, 50%, and 30% would give
you)
See formula pg 28 Zar
Adapted from: http://www.math.toronto.edu/mathnet/
Geometric mean & log transformed variables
Geometric mean = back-transformed mean of a
logarithmically transformed variable
more on transformation later
-Log transform a variable and then take the mean (x
----t)
-Take antilog of (xt)
-This mean will not = the simple mean of original
variables
GMx = antilog 1 Log Y
n
Harmonic mean = reciprocal of the simple mean of
reciprocals
Measures of Dispersion
Range = difference between minimum and maximum
value, crude measure of dispersion
40
same mean and
range, but have
different patterns
of dispersion
(variability)
30
20
10
0
A
B
40
30
20
10
0
A
B
Deviation from the mean = difference between each point
and the mean (sums to zero)
Sum of squares = square the deviation of each data point
from the mean added together
S of S = (XI – X)2
for the above example:
(A) 225+1+0+0+0+1+225= 452
(B) 225+100+25+0+25+100+225= 700
Sample SS
Variance of a sample = mean S of S of the sample
s2 =
(Xi – X)2
n-1
gives unbiased estimate
= degrees of freedom
-If all observations equal, then s2 = 0
-Can never be negative
Excel demo
Standard Deviation =positive square root of variance of a
sample
s=
s2
-same units as the original data
- it does not vary with sample size
Excel demo
Coefficient of variation = standard deviation relative to the
mean
s
CV=
X
Population parameters
Greek letters
Statistic (sample estimate)
Latin letters
Unbiased: not over or under estimate true parameter value
Efficient: close to the true parameter value
Consistent: estimate becomes better as sample (N) increases
Coding data
Adding, subtracting, multiplying or dividing ALL numbers
by a constant value to simplify computations
Addition & subtraction affect mean, but not ss,
variance, or SD
Multiplication & division affect mean, and also ss,
variance, or SD