Statistics - University of Oregon
Download
Report
Transcript Statistics - University of Oregon
Introduction to
Educational Statistics
Joseph Stevens, Ph.D., University of Oregon
(541) 346-2445, [email protected]
WHAT IS STATISTICS?
Statistics is a group of methods used
to collect, analyze, present, and
interpret data and to make decisions.
POPULATION VERSUS SAMPLE
A population consists of all elements –
individuals, items, or objects – whose
characteristics are being studied. The
population that is being studied is also
called the target population.
POPULATION VERSUS SAMPLE
cont.
The portion of the population selected for
study is referred to as a sample.
POPULATION VERSUS SAMPLE
cont.
A study that includes every member of the
population is called a census. The
technique of collecting information from a
portion of the population is called
sampling.
POPULATION VERSUS SAMPLE
cont.
A sample drawn in such a way that each
element of the population has an equal
chance of being selected is called a simple
random sample.
TYPES OF STATISTICS
Descriptive Statistics consists of
methods for organizing, displaying, and
describing data by using tables, graphs,
and summary measures.
TYPES OF STATISTICS
Inferential Statistics consists of
methods that use information from
samples to make predictions, decisions
or inferences about a population.
Basic Definitions
A variable is a characteristic under study
that assumes different values for different
elements. A variable on which everyone has
the same exact value is a constant.
Basic Definitions
The value of a variable for an element is
called an observation or measurement.
Basic Definitions
A data set is a collection of observations
on one or more variables.
A distribution is a collection of
observations or measurements on a
particular variable.
TYPES OF VARIABLES
Quantitative Variables
Discrete Variables
Continuous Variables
Qualitative or Categorical Variables
Quantitative Variables cont.
A variable whose values are countable is
called a discrete variable. In other words, a
discrete variable can assume only a limited
number of values with no intermediate
values.
Quantitative Variables cont.
A variable that can assume any numerical
value over a certain interval or intervals is
called a continuous variable.
Categorical Variables
A variable that cannot assume a numerical
value but can be classified into two or more
categories is called a categorical variable.
Scales of Measurement
How much information is contained in the
numbers?
Operational Definitions and measurement
procedures
Types of Scales
Nominal
Ordinal
Interval
Ratio
Descriptive Statistics
Variables can be summarized and displayed
using:
Tables
Graphs and figures
Statistical summaries:
Measures of Central Tendency
Measures of Dispersion
Measures of Skew and Kurtosis
Measures of Central Tendency
Mode – The most frequent score in a
distribution
Median – The score that divides the
distribution into two groups of equal size
Mean – The center of gravity or balance
point of the distribution
Median
The calculation of the median consists of
the following two steps:
Rank the data set in increasing order
Find the middle number in the data set
such that half of the scores are above and
half below. The value of this middle
number is the median.
Arithmetic Mean
The mean is obtained by dividing the sum of all
values by the number of values in the data set.
Mean for sample data:
X
X
n
Example: Calculation of the mean
Four scores: 82, 95, 67, 92
X
X
n
336
84
4
The Mean is the Center of Gravity
82
67
92
95
The Mean is the Center of Gravity
X
82
95
67
92
(X – X)
82 – 84 = -2
95 – 84 = +11
67 – 84 = -17
92 – 84 = +8
∑(X – X) = 0
Comparison of Measures of Central
Tendency
Measures of Dispersion
Range
Variance
Standard Deviation
Range
Highest value in the distribution minus the
lowest value in the distribution + 1
Variance
Measure of how different scores are on
average in squared units:
∑(X – X)2 / N
Standard Deviation
Returns variance to original scale units
Square root of variance = sd
Other Descriptors of Distributions
Skew – how symmetrical is the distribution
Kurtosis – how flat or peaked is the
distribution
Kinds of Distributions
Uniform
Skewed
Bell-shaped or Normal
Ogive or S-shaped
Normal distribution with mean μ and
standard deviation σ
Standard
deviation = σ
Mean = μ
x
Total area under a normal curve.
The shaded area is
1.0 or 100%
μ
x
A normal curve is symmetric about the mean
Each of the two shaded
areas is .5 or 50%
.5
.5
μ
x
Areas of the normal curve beyond μ ± 3σ.
Each of the two shaded areas is very
close to zero
μ – 3σ
μ
μ + 3σ
x
Three normal distribution curves with the
same mean but different standard deviations
σ=5
σ = 10
σ = 16
μ = 50
x
Three normal distributions with different
means but the same standard deviation
σ=5
µ = 20
σ=5
σ=5
µ = 30
µ = 40
x
Areas under a normal curve
For a normal distribution approximately
68% of the observations lie within one
standard deviation of the mean
2. 95% of the observations lie within two
standard deviations of the mean
3. 99.7% of the observations lie within three
standard deviations of the mean
1.
99.7%
95%
68%
μ – 3σ
μ – 2σ
μ–σ
μ
μ+σ
μ + 2σ
μ + 3σ
Score Scales
Raw Scores
Percentile Ranks
Grade Equivalents (GE)
Standard Scores
Normal Curve Equivalents (NCE)
Z-scores
T-scores
College Board Scores
Converting an X Value to a z Value
For a normal random variable X, a particular value
of x can be converted to its corresponding z value
by using the formula
z
X
where μ and σ are the mean and standard deviation
of the normal distribution of x, respectively.
The Logic of Inferential Statistics
Population: the entire universe of
individuals we are interested in studying
Sample: the selected subgroup that is
actually observed and measured (with
sample size N)
Sampling Distribution of a Statistic: a
distribution of samples like ours
The Three Distributions Used in
Inferential Statistics
I. Population
III. Sampling Distribution
of the Statistic
II. Sample