Descriptive Statistics

Download Report

Transcript Descriptive Statistics

Descriptive & Inferential
Statistics
Adopted from ;Merryellen Towey Schulz, Ph.D.
College of Saint Mary
EDU 496
The Meaning of Statistics
Several Meanings
• Collections of
numerical data
• Summary measures
calculated from a
collection of data
• Activity of using and
interpreting a
collection of
numerical data
• Last year’s enrollment
figures
• Average enrollment
per month last year
• Evaluators made a
projection of next
year’s enrollments
Descriptive Statistics
• Use of numerical information to summarize,
simplify, and present data.
• Organized and summarized for clear
presentation
• For ease of communications
• Data may come from studies of populations
or samples
Descriptive Statistics Associated
with Methods and Designs
Design
Descriptive Statistics
Survey Studies
Percentages, measures of central
tendency and variation
Meta-analysis
Effect sizes
Causal comparative studies
Measures of central tendency &
variation, percentages, standard
scores
Experimental
Measures of central tendency &
variation, percentages, standard
scores, effect sizes
Descriptive Stats Vocabulary
•
•
•
•
•
•
•
•
Central tendency
Mode
Median
Mean
Variation
Range
Standard deviation
Normal distribution
Descriptive Stats Vocabulary
cont’d
•
•
•
•
Standard score
Effect size
Correlation
Regression
Inferential Statistics
• To generalize or predict how a large group
will behave based upon information taken
from a part of the group is called and
INFERENCE
• Techniques which tell us how much
confidence we can have when we
GENERALIZE from a sample to a
population
Inferential Stats Vocabulary
•
•
•
•
•
•
•
Hypothesis
Null hypothesis
Alternative hypothesis
ANOVA
Level of significance
Type I error
Type II error
Examples of Descriptive and
Inferential
Statistics
Descriptive Statistics
Inferential Statistics
• Graphical
– Arrange data in tables
– Bar graphs and pie charts
• Numerical
– Percentages
– Averages
– Range
• Relationships
– Correlation coefficient
– Regression analysis
• Confidence interval
• Margin of error
• Compare means of two
samples
– Pre/post scores
– t Test
• Compare means from three
samples
– Pre/post and follow-up
– ANOVA = analysis of
variance
Problems With Samples
• Sampling Error
– Inherent variation between sample and population
– Source is “chance or luck”
– Results in bias
• Sample statistic -- a number or figure
– Single measure -- how sure accurate
– Comparing measures --see differences
• How much due to chance?
• How much due to intervention?
What Is Meant By A Meaningful
Statistic (Significant)?
• Statistics, descriptive or inferential are NOT a
substitute for good judgment
– Decide what level or value of a statistic is meaningful
– State judgment before gathering and analyzing data
• Examples:
– Score on performance test of 80% is passing
– Pre/post rules instruction reduces incidents by 50%
Interpretation of Meaning
• Population Measure (statistic)
– There is no sampling error
– The number you have is “real”
– Judge against pre-set standard
• Inferential Measure (statistic)
– Tells you how sure (confident) you can be the
number you have is real
– Judge against pre-set standard and state how
certain the measure is
Descriptive Statistics
for one variable
Statistics has two major chapters:
• Descriptive Statistics
• Inferential statistics
Statistics
Descriptive Statistics
• Gives numerical and
graphic procedures to
summarize a collection
of data in a clear and
understandable way
Inferential Statistics
• Provides procedures
to draw inferences
about a population
from a sample
Descriptive Measures
• Central Tendency measures. They are
computed to give a “center” around which the
measurements in the data are distributed.
• Variation or Variability measures. They
describe “data spread” or how far away the
measurements are from the center.
• Relative Standing measures. They describe
the relative position of specific measurements in the
data.
Measures of Central Tendency
• Mean:
Sum of all measurements divided by the number
of measurements.
• Median:
A number such that at most half of the
measurements are below it and at most half of the
measurements are above it.
• Mode:
The most frequent measurement in the data.
Example of Mean
Measurements
x
Deviation
x - mean
3
-1
5
1
5
1
1
-3
7
3
2
-2
6
2
7
3
0
-4
4
0
40
0
• MEAN = 40/10 = 4
• Notice that the sum of the
“deviations” is 0.
• Notice that every single
observation intervenes in
the computation of the
mean.
Example of Median
Measurements Measurements
Ranked
x
x
3
0
5
1
5
2
1
3
7
4
2
5
6
5
7
6
0
7
4
7
40
40
• Median: (4+5)/2 =
4.5
• Notice that only the
two central values are
used in the
computation.
• The median is not
sensible to extreme
values
Example of Mode
Measurements
x
3
5
5
1
7
2
6
7
0
4
• In this case the data have
tow modes:
• 5 and 7
• Both measurements are
repeated twice
Example of Mode
Measurements
x
3
5
1
1
4
7
3
8
3
• Mode: 3
• Notice that it is possible for a
data not to have any mode.
Variance (for a sample)
• Steps:
– Compute each deviation
– Square each deviation
– Sum all the squares
– Divide by the data size (sample size) minus
one: n-1
Example of Variance
Measurements Deviations
x
3
5
5
1
7
2
6
7
0
4
40
x - mean
-1
1
1
-3
3
-2
2
3
-4
0
0
Square of
deviations
1
1
1
9
9
4
4
9
16
0
54
• Variance = 54/9 = 6
• It is a measure of
“spread”.
• Notice that the larger
the deviations (positive
or negative) the larger
the variance
The standard deviation
• It is defines as the square root of the
variance
• In the previous example
• Variance = 6
• Standard deviation = Square root of the
variance = Square root of 6 = 2.45
Percentiles
• The p-the percentile is a number such that at most p%
of the measurements are below it and at most 100 – p
percent of the data are above it.
• Example, if in a certain data the 85th percentile is 340
means that 15% of the measurements in the data are
above 340. It also means that 85% of the
measurements are below 340
• Notice that the median is the 50th percentile
For any data
• At least 75% of the measurements differ from the mean
less than twice the standard deviation.
• At least 89% of the measurements differ from the mean
less than three times the standard deviation.
Note:
This is a general property and it is called Tchebichev’s Rule: At
least 1-1/k2 of the observation falls within k standard deviations from the
mean. It is true for every dataset.
Example of Tchebichev’s Rule
Suppose that for a certain
data is :
• Mean = 20
• Standard deviation =3
Then:
• A least 75% of the
measurements are
between 14 and 26
• At least 89% of the
measurements are
between 11 and 29
Further Notes
• When the Mean is greater than the Median the
data distribution is skewed to the Right.
• When the Median is greater than the Mean the
data distribution is skewed to the Left.
• When Mean and Median are very close to each
other the data distribution is approximately
symmetric.