Transcript Mean Median

Chapter 11
Univariate Data Analysis; Descriptive Statistics
These are summary measurements of a single variable.
I.
Averages or measures of central tendency – describes a
dataset.
A.
Three kinds: mean, median, mode.
1.
Mean: most common. Sum all the values in a group, divide
by the total number of values in that group (Hint: start listing
them in columns/headings).
X

X 
n
X , mean
 X is the sum of all the values of X
n is the number of values (or cases)
Weighted Mean (symbol=Med): Multiply each value by its frequency.
Sum. Divide by total frequency.
2. Median: the mean is very sensitive to outlier scores that skew
the distribution; median is not. It is the midpoint value.
Instructions: order all values. Find the middle-most score. That’s the
median (if even number of cases, find middle-most two values;
add them, divide by two).
Percentiles: 50th percentile is the median. 75th percentile means
score is at or above 75% of the other scores.
3. Mode: most frequent value.
B. When to use what.
1. Three kinds of data
a. Nominal – categorical data (race, region).
b. Ordinal – values are ranked, but not necessarily equal in
distance (7 values indicating GOP support).
c. Interval – values are equal in distance (income).
2. Use mean for interval (and sometimes ordinal). Use mode for
nominal (and sometimes ordinal), especially when generating
%s. Use median for interval if you think there are outliers.
II. Variability – how much scores differ from one
another.
Which set of scores has greater variability?
Set 1: 8,9,5,2,1,3,1,9
Set 2: 3,4,3,5,4,6,2,3
Means are Set 1: 4.75 and Set 2: 3.75. Tells us nothing
of variability.
Variability is more precisely how different/far scores are
from the mean.
III. Computing the Range
Subtract the lowest score from the highest (r=h-l)
What is the range of these scores? 98,86,77,56,48
Answer: 50 (98-48=50)
IV. Computing the Standard Deviation
The standard deviation (s) is the average amount of
variability in a set of scores (average distance from
mean).
A. Formula:
s
 X
X

2
n 1
Compute s for the following:
5,8,5,4,6,7,8,8,3,6
So, an s of 1.76 tells us that each score differs from the
mean by an average of 1.76 points.
B. Purpose: to compare scores between different
distributions, even when the means and standard
deviations are different (e.g., men and women).
Larger the s the greater the variability.
V. Graphing and Tables. Why? Describes data visually, more
clearly.
Frequency Distribution (Table 11-4)
A.
Class Interval Column – divides the scores up into
categories (0-4, 5-9, etc.). Usually range of 2,5,10, or 25
data points. Main thing: be consistent!
B.
Frequency Column – number of scores within that range or
category.
VI. Graphs
A.
Histogram – shows the distribution of scores by class
interval. Can compare different distributions on the same
histogram. Shows:
1.
Variability
2.
Skewness - If the mean is greater than the median, positive
skewness. If median is greater than mean, negative
skewness.
Relative Frequency
Central Tendency and Variability
Centre
Relative Frequency
Central Tendency and Variability
Spread
Skewness
Relative Frequency
If the data set is symmetric, the mean equals
the median.
Median
Mean
Skewness
If the data set is skewed to the right, the
mean is greater than the median.
Median
Mean
Skewness
If the data set is skewed to the left, the mean
is less than the median.
Mean
Median
B. Column Charts – simply tells the quantity of a
category according to some scale. SCALE
IS IMPORTANT (CSPAN-drug use story).
C. Bar Charts – same as Column chart, but
reverse the axes.
D. Line Chart – Used to show trends (e.g. rise
and fall in presidential popularity – line on
page 317).
E. Pie Charts – Great for proportions (percent of
MS budget going to each budget category).
VII. The Normal Curve and Probability Theory
A. Tells us likelihood of an outcome
B.
Tells us degree of confidence in a finding or outcome (i.e.,
how sure are we that the observed outcome is due to X
versus random chance? AND how likely is it that our
research hypothesis is true?).
VIII. Normal Curve or Bell-Shaped Curve Properties (Fig. 11-6)
A. Mean, median and mode are same NOT Skewed
B. Perfectly symmetrical about the mean
(i.e., two halves fit perfectly together).
C. Tails of the normal curve are asymptotic.
Curves come close, but never touch the
horizontal axis.
Are curves usually normal? Yes, especially with
large sets of data (more than 30). Most
scores are concentrated in the center and
few are concentrated at the ends (height,
intelligence, coin flipping).
IX. Divisions of the Normal Curve (Fig. 11-9)
A.
Mean is at the center
B. Scores along x-axis correspond to standard deviations.
C. Sections within the bell curve represent % of cases expected to
fall therein. Geometrically true (these are percentages of
entire normal distribution).
D. For normal distributions (most data sets), practically all scores
fall in between +3 and -3 sd’s (99.74%). Look at the
probabilities of falling in between. 34.13% x 2 = 68.26%
cases fall within 1 to -1 sd’s from mean.
X. Z-scores (standard scores; i.e. the # of
standard deviations from the mean)
A. Allow us to compare distributions with one
another because they are scores that are
standardized in units of standard deviations
(can’t compare scores if they are measured
differently; nonsensical). Different variables
or groups will have different means and
cannot be compared. But z-scores between
groups of data can be compared because
they are equivalent (e.g., one unit above or
below the mean, respectively).
B. Formula and interpretation
VII. Comparing z-scores from different
distributions.
-The raw scores of 12.8 and 64.8 in our data
are equal distances from their respective
means (z=.4 for both)
VIII. What z-scores represent
A. Z-scores correspond to sections under
the curve (percentages under the curve).
B. These percentages can be seen as
probabilities of a certain score occurring given
on the Z-score table.
Example of what we are saying:
“In a distribution with a mean of 100 and standard
deviation of 10, what is the probability that any
score in the data set will be 110 or above?”
The answer = _________.
C. What about a z-score of 1.38? What are the
chances that a score will fall within the mean
and a z-score of 1.38? _______
• What about above a z-score of 1.38?____
• What about at or below 1.38?______
•
•
What about between a z-score of 1 and 2.5?
Answer:______
Again, we are asking, what is the probability
that a score will fall in between 1 and 2.5
standard deviations (z’s) of the mean? -1
and 2.5?