Chapter 9: Descriptive Statistics
Download
Report
Transcript Chapter 9: Descriptive Statistics
Experimental Research
Methods in Language
Learning
Chapter 9
Descriptive Statistics
Leading Questions
• Do you think statistics is difficult to
understand? Will it be difficult to learn? Why
do you think so?
• What do you know is involved in performing a
statistical analysis of experimental data?
• Can you give an example of descriptive
statistics? What does it tell us about language
learners or research participants?
Stages in Statistical Analysis
Checking and Organizing Data
• Check whether all participants’ data are
complete
• Some participants may not have answered
some questionnaire or test items.
• Incomplete data are missing data and we
need to make a decision on how to deal with
them.
• The best strategy to organize data is to assign
an identity number (ID) to each participant.
Coding Data
• The process of classifying or grouping data sets.
• In some sense, coding data is closely related to
organizing data so that we know how to
statistically analyze them meaningfully.
• Quantitative data are coded through scales
(nominal, ordinal, interval and ratio).
• How a test or measure is scored needs to be
clearly stated/described.
• Some qualitative data such as standardized thinkaloud, performance assessment, or interview data
can be coded for quantitative data analysis.
Entering Data
• Once the data have been coded and
numerical values have been assigned to
each participant, we can key them into a
statistical software program (e.g., SPSS, Excel).
• In some cases, we can code data as missing.
In other cases, we may have to remove the
participants who have too many data
missing.
Screening and Cleaning Data
• Checking for accuracy in data entry
accuracy.
• Use of descriptive statistics to check for
incorrectly-entered data.
• Examine abnormal or impossible values in the
data set (e.g., by looking at the minimum and
maximum scores; using visual diagrams such
as histograms and pie charts).
Computing Descriptive Statistics
• Descriptive statistics provide basic information
about the data (e.g., mean scores, minimum
and maximum scores, standard deviations).
• They can tell us whether we need to employ
a parametric test for normally distributed
data, or a non-parametric test for non-normal
distributed data.
Estimating Data Reliability
• To check that the data to be analyzed are
reliable and valid.
• The reliability of a research instrument is
related to its consistency of measurement.
• The validity of a research instrument refers to
the fact that the instrument actually measures
what is intended to be measured.
Reducing Data
• To summarize the score for each test section
(or sometimes for an overall test) for data
entry and statistical analysis.
• To compute a score for each sub-scale in a
questionnaire (e.g., Likert-scale), i.e., using
composites.
• To perform a reliability analysis to see whether
some items negatively affect the reliability of
the instruments and if so, they can be
removed.
• To perform a confirmatory factor analysis
Computing Inferential Statistics
• Inferential statistics are key statistical analyses
that can yield answers to research questions.
• Statistics are probabilistic.
• Inferential statistics involves testing
hypotheses, examining effect sizes and so on.
Addressing Research Questions
• Use of inferential statistics, such as a t-test to
answer a research question.
• We think whether the statistical findings make
sense or are meaningful, and consider how to
best report and discuss them.
• It is strategic to answering the research
questions (informally) during data analysis
because it helps facilitate the task of writing
up the findings.
Descriptive Statistics
• Descriptive statistics provide the basic
characteristics of quantitative data (e.g.,
frequencies, average scores, most frequent
scores).
• Descriptive statistics provide measures of
quantitative data (e.g., measures of central
tendency, measures of variability, and
measures of relative position).
Measures of Central Tendency
• The Mean = simply the average of the
data/scores
• The Median = the value that divides the
dataset exactly into two sets: half the scores
are smaller than the median and half the
scores ae larger.
• The mode = the value that occurs most
frequently in the data
The Normal Distribution
Skewness and Kurtosis Statistics
• Skewness statistics tell us the extent to which
the data set is symmetrical. A data set is
symmetrical if the skewness statistic is zero.
• Kurtosis statistics shows the extent to which
the shape of the distribution is pointy. A
normally distributed data set has a kurtosis
value of zero.
• Ideally, skewness and kurtosis statistics should
be within ± 1 for a data set to be considered
normally distributed.
Measures of dispersion
• Dispersion = the extent to which the data set is spread
out. Measures of dispersion are interchangeably
known as measures of variability.
• The range = simply the difference between the highest
and lowest scores in the data set.
• The variance and standard deviation are commonly
used measures of dispersion.
• The standard deviation indicates how much, on
average, the individual values differ from the mean
(see Table 9.4)
• The variance = the average of the squared standard
deviation.
The Standard Deviation and the
Normal Distribution
Measures of Relative Standing
• Percentile rank = a statistic that tells us the
percentage of scores in the distribution that
are below a given score.
• For example, a score with a 40 percentile rank
has 40% of scores below it. It is quite simple to
calculate a percentile rank as follows: rank of
a score ÷ [total number of scores +1].
The z-scores
• The z-scores allow us to see how an
individual’s score can be placed in relation to
the rest of the participants’ scores.
• A z-score is basically a raw score that has
been converted to a standard deviation
format (see Figure 9.3 above).
• The T-score is thus an extension of the z-score
which allows us to avoid the use of negative
values. The T-score is calculated as follows: [10
x z-score] + 50.
Discussion
• What are purposes of descriptive statistics for
experimental research?
• Can you think of an example of quantitative
data that are normally distributed?
• What are common types of measures of
tendency? Can you explain what they are
and how they are calculated?
• What is the most difficult concept of
descriptive statistics we have discussed in this
chapter?