Class 11 Data Analysis
Download
Report
Transcript Class 11 Data Analysis
Class Meeting #11
Data Analysis
Types of Statistics
Descriptive
Statistics used to
describe things,
frequently groups of
people.
Central Tendency
Variability
Relative Standing
Relationship
Inferential
Statistics used to
make inferences and
draw conclusions.
Parametric (t-test,
ANOVA, multiple
regression)
Non-Parametric
(chi-square)
Types of Analysis
Univariate – looks at one variable at a
time.
Bivariate – looks at variables two at a
time.
Multivariate – looks at three or more
variables at a time.
Types of Variables
Independent (or Predictor)
The variable measured first in time and from which
a prediction is made. The “cause” variable.
Dependent (or Predicted)
The variable measured later in time and which is
desirable to predict. The “effect” variable.
Dayton, C. M. & Stunkard, C. L. (1971). Statistics for Problem Solving.
New York: McGraw-Hill Book Company.
Charles, C. M. & Mertler, C. A. (2002). Introduction to Educational
Research, 4th Edition. Boston: Allyn and Bacon.
Measurement Scales
Nominal – A scale that measures data by
name only, such as gender, hair color, race.
Ordinal – A scale that measures data by
rank order only, such as medical condition,
military rank, socioeconomic status.
Interval – A scale that measures data by
using equal intervals, such as temperature,
percentage correct on a test.
Nominal Scales
A number is used to represent a
category.
The number has no meaning beyond
serving as a label.
Categories are mutually exclusive but
qualitatively different.
Ordinal Scales
A number is used to represent a category.
The number has no meaning beyond
serving as a label.
Categories are mutually exclusive but
qualitatively different.
The categories are ordered in a
meaningful way.
Differences between consecutive units of
measurement can be unequal.
Interval Scales
A number is used to represent a specific
amount.
The numbers are meaningful in that they
represent equal-sized units that
correspond to equal increases in
amounts of the underlying attribute.
The scale may include a zero value, but
the zero is not meaningful. It is only a
convenient starting point for
measurement.
Ratio Scales
A number is used to represent a specific
amount.
The numbers are meaningful in that they
represent equal-sized units that
correspond to equal increases in amounts
of the underlying attribute.
In addition, there is a true zero on the
scale that represents a true absence of
the attribute being measured.
Organizing Data
Frequency Distribution
A table showing the
number of test takers
who received each of
the scores possible
(simple frequency
distribution), or the
number of test takers
who scored within a
specified interval range
(grouped frequency
distribution).
X
(score)
9
(frequency)
8
6
7
8
6
4
5
2
4
4
3
3
2
1
f
3
Displaying Data
Histogram (bar graph)
Frequency Polygon (line graph)
Scatter Plots
Bar Graph
Sometimes referred to as “column
graph”
Useful in presenting or comparing
differences between groups
Sometimes used to show how groups
differ over time
Nichol & Pexman
Bar Graph
90
80
70
60
50
40
30
20
10
0
East
West
North
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
Effective Elements for Bar Graphs
Dependent variable is on the vertical axis.
Independent variable is on the horizontal axis.
Length of vertical axis should be 2/3 to 3/4 the length of
the horizontal axis.
Positive values increase to the right (horizontal axis) or
up (vertical axis).
Negative values increase to the left (horizontal axis) or
down (vertical axis).
Highest value on either scale is larger than the highest
data value.
Bars are clearly differentiated from one another.
Bars are of the same width.
Nichol & Pexman
Line Graph
Used to present a change in one or
more dependent variables as a
function of an independent variable
Particularly useful in demonstrating a
trend or an interaction
Must have at least 3 data points
Nichol & Pexman
Line Graph
Effective Elements for Line Graphs
Dependent variable is on the vertical axis.
Independent variable is on the horizontal axis.
Length of vertical axis should be 2/3 to 3/4 that of the
horizontal axis.
Positive values increase to the right (horizontal) or up
(vertical).
Negative values increase to the left (horizontal) or down
(vertical).
No more than four lines or curves per graph.
Lines within the graph can be clearly differentiated from
one another.
Nichol & Pexman
Scatter Plot
Present values of single events as a function of two
variables scaled along the vertical and horizontal
axes.
Purpose is usually to explore the relationship between
two variables.
A linear relationship (high correlation) may be
indicated if the data points are clustered along the
diagonal within the area of the plot.
Nichol & Pexman
Scatter Plot
Effective Elements for Plots
Length of vertical axis should be 2/3 to 3/4 the length of
the horizontal axis.
Zero points are indicated on the axes.
Data points are represented by symbols that are
approximately the same size as lowercase letters used
in text on the figure.
Nichol & Pexman
Measures of Central Tendency
Mean (arithmetic average)
Median (middle score in the
distribution, better known as the 50th
percentile)
Mode (most frequently occurring
score)
Comparing Measures of Central
Tendency
The mean is more stable over time
because each score in the distribution
enters into the computation. It is,
however, more affected by extreme
scores.
The median is less affected by extreme
scores.
The mode is easiest to determine but is
the least stable.
Extreme Scores
Extreme scores, or
“outliers, are
individual low or high
values in a group (or
distribution) or scores
that greatly affect the
value of the mean.
Measures of Variability
Range (R)
The difference between the highest and
lowest scores in a distribution.
Standard Deviation (SD)
The estimate of variability that
accompanies the mean in describing a
distribution.
Comparing Measures of Variability
Standard deviation is more reliable
than range.
Standard deviation is used in
calculation of other statistics such as
standard scores and error scores.
Measures of Relationship
Paired Samples t-test compares the means
of two variables. It computes the difference
between the two variables for each case,
and tests to see if the average difference is
significantly different from zero.
t-test for Independent Samples compares
the mean scores of two groups on a given
variable.
Measures of Relationship
One-Way ANOVA*
Used to test for differences among two or
more independent groups.
* Analysis of Variance
Measures of Relationship
Pearson’s Chi Square
A general test for the existence of a
relationship between two or more
nominal level variables.
Coefficient of Correlation (r)
Expresses the degree of relationship
between two sets of scores.
Statistical Significance
p > .05 means that differences could have
occurred 5 or more times in 100 samples.
(NOT significant)
p < .05 means that differences could have
occurred less than 5 times in 100 samples.
(significant)
p < .01 means that differences could have
occurred less than 1 time in 100 samples.
(more significant)
Error
Type I – You conclude that a
relationship exists between variables
when in reality there is none.
Type II – You conclude that a
relationship does not exist between
variables when in reality there is one.