PPT Lecture Notes

Download Report

Transcript PPT Lecture Notes

Descriptive
Statistics
Outline of Today’s Discussion
1.
2.
3.
4.
5.
6.
Central Tendency
Dispersion
Graphs
Excel Practice: Computing the S.D.
SPSS: Existing Files
SPSS: Entering Data
Part 1
Central Tendency
The Research Cycle
Real
World
Abstraction
Generalization
Research
Conclusions
Research
Representation
Methodology
***
Data Analysis
Research
Results
Central Tendency
1. One of the themes in our course will be contrasting
so-called “inferential statistics” from “descriptive
statistics”.
2. Inferential statistics are used to determine (“infer”)
whether two populations (or conditions) are
significantly different from each other.
3. By contrast, descriptive statistics are simply used to
depict (“describe”) the data in a study. We’ll focus
on three descriptive measures of central tendency…
Central Tendency
1. The crudest measure of central tendency is the mode
- the most frequently occurring score.
2. Here are some examples:
The modal number of eyes is two.
The modal number of fingers per hand is five.
The modal number of years to graduate is four.
Other examples?
3. A frequency distribution can have one mode, or two
modes (bi-modal), or more (multi-modal).
Central Tendency
1. The next most precise measure of central tendency is
the median - the middle score. It is the 50th
percentile, i.e., the point at which half of the scores
are greater and half are less.
2. The median is equal to the middle value when the
data set contains an odd number of items
[3, 4, 5, 6, 7, 8, 8].
3. The median is equal to the the average of the two
middle two values when the data set contains an even
number of items.
4.
Median of [3, 5, 5, 7, 8, 8] = (5+7)/2 = 6.
Central Tendency
1. The median is the best measure of central
tendency when one side of a frequency
distribution contains a few, extreme scores.
2. Would someone give us some examples of
“skewed” distributions (ones having a few
extreme scores)?
Central Tendency
1. The most commonly used measure of central
tendency is the mean - the arithmetic average.
2. There are many symbols for the mean.
For a population, we use , pronounced Myou.
For a sample, we simply use “M”.
In computations, we use “X bar”.
3. Mean = S X / N (i.e., the sum of X over N).
Central Tendency
1. Here’s some sample syntax for the measures of
central tendency in Excel….
2. “ =mode(a1:a9)”
3. “ =median(a1:a9)”
4. “ =average(a1:a9)”
5. Questions on measures of central tendency?
Part 2
Dispersion
Dispersion
1. When describing the data (i.e., when
generating DESCRIPTIVE STATISTICS), we
want to know how the scores are distributed
(“dispersed”) around the center.
2. There are several measures of dispersion.
3. We’ll consider two (for now), the range, and
the standard deviation.
Dispersion
1. The range is a crude measure of dispersion. It is
computed as Range = Max - Min.
2. In the set of scores [2, 4, 5, 9], the range of the scores
would be Max - Min = 9 - 2 = 7 units.
3. Sometimes, rather than reporting the range,
researchers will simply report the Max & Min scores.
Dispersion
1. Now, let’s consider the standard deviation, which is
the most commonly used measure of dispersion (i.e.,
the counterpart to the mean).
2. Potential Pop Quiz Question: What information does
the standard deviation provide, in your own words,
(no equations here).
3. To compute the standard deviation, we first need to
compute a few important quantities. One of these is
called the Sum of Squares or SS…
Dispersion
1. The first step is to get the deviation of each
score from the mean. Here Mean = 8.
2. Then, we square the deviations, and sum
them…
Dispersion
1. So, the SS is the sum of the squared deviations
from the mean.
2. In this case SS = 44.
Dispersion
1. Here are two ways to look at the SS.
2. Often, we have a definitional formula, and an
equivalent computational formula.
Dispersion
1. The next step in computing the standard
deviation is to determine the “variance”.
2. Variance - the average squared deviation from
the mean (so, the variance, itself, is a mean).
3. We can compute the variance of either a
population (i.e., every member of a group) or a
sample (i.e., just a subset of a group)…
Dispersion
Sigma
“s”
1. Note: The only difference is whether we divide
by N (for population), or n-1 (for sample).
2. Assuming SS is constant, which formula will
generate a larger variance?
Dispersion
1. Finally, to get from squared units to “regular”
units, we need to take the square root of the
variance.
2. The standard deviation is the square root of
the variance. (Say it with me.)
3. This is true whether we’re talking about the
SD of a population, or a sample…
Dispersion
Sigma
“s”
The standard deviation is the
square root of the variance!!
Dispersion
Another way to express the Standard Deviation
(we’ve substituted the SS formula in the numerator)
Dispersion
1. Phew!! That was a lot!!! Let’s review the
concepts in dispersion.
2. We have focused on two measures of
dispersion; Range and Standard Deviation
(SD);
3. Range is simple. It’s the Max - Min.
Dispersion
1. The standard deviation indicates, approximately,
how far, on average, a score departs from the mean.
2. The standard deviation depends on some important
quantities - the SS and the Variance.
3. The formula for the population variance (based on
N) is slightly different than that for the sample
variance (based on N-1).
4. The standard deviation is the square root of the
variance!
Part 3
Graphs
Graphs
1. Let’s learn some terminology about graphs.
2. The x-axis (the horizontal axis) is called the abscissa.
3. The y-axis (the vertical axis) is called the ordinate.
4. By convention, which axis contains the IV, and
which axis contains the DV?
5. When describing a graph verbally, we typically state
“..in this graph, (DV) is plotted as a function of (IV).”
Graphs
Describe how the variables are plotted in this graph.
How many “levels” does the IV have, and what are they?
Graphs
1. Sometimes an experiment has more than one IV.
This is called a factorial experiment.
2. Graphs from factorial experiments typically plot one
of the IV’s on the abscissa, and the other IV by using
different symbols (sometimes called parameters) in
the legend.
3. Verbally, we state, “.. in this graph, (DV) is plotted as
a function of IV#1, with IV#2 as parameters.”
Graphs
Describe how the variables are plotted in this graph.
How many IV’s, and how many levels of each?
Graphs
Describe how the variables are plotted in this graph.
How many IV’s, and how many levels of each?
Graphs
Here are the data from the preceding graph.
Graphs are interpreted more quickly than tables are.
Graphs
1. Another important point about graphs is that
the ordinate should start at zero.
2. If not, the graph will lose proportionality, and
become very misleading.
3. Small differences could appear huge…
Graphs
Which graph is “messed up”?
Part 3
Excel Practice:
Computing the S.D.
Part 4
SPSS Practice:
Existing Files
SPSS: Existing Files
1. SPSS data files have an extension of “.sav”
2. In variable view, each row corresponds to a new
variable. The columns indicate how the variable is
‘defined’ to SPSS.
3. In data view, each row corresponds to a different
participant (called a ‘case’ in SPSS). Each column
pertains to a different variable.
4. Note that variables are in rows in variable view, but
in columns in data view. (“It rotates by 90 degrees.”)
Part 5
SPSS Practice:
Entering Data
SPSS: Entering Our Own Data
1.
When entering your data in SPSS, always begin in the
“VARIABLE VIEW”.
2.
Each variable is in a separate row in variable view.
3.
Here’s a good habit…make your first variable one that
identifies the participant in some way.
4.
In the type column, you should use ‘numeric’ unless you want
to directly enter words on the data view. If so, you should
change the type column to ‘string’.
5.
In the measure column, you have three choices; Nominal,
Ordinal, or Scale (which is interval and ratio).
SPSS: Entering Our Own Data
1. Value labels, can facilitate data coding! These are
defined in variable view.
2. Example on Religion:
1=Buddhist;
2=Hindu;
3=Islamic;
4=Jewish;
5=Christian;
3. In data view, under the view menu the value labels
can be ‘turned on’ or ‘off’!