Statistical Technique for Analyzing Quantitative Data
Download
Report
Transcript Statistical Technique for Analyzing Quantitative Data
Statistical Techniques for Analysing
Quantitative Data
• With statistics, we can summarise large bodies of data, make
predictions about future trends, and determine when different
experimental treatments have led to significantly different outcomes
• Thus, statistics are among the most powerful tools in the researcher’s
toolbox
• Statistics can be thought of as a group of computational procedures that
allow us to find patterns and meaning in numerical data
• In reality, statistics are invaluable and often indispensable tools in
research
• They provide a means through which numerical data can be made more
meaningful, so that the researcher may see their nature and better
understand their interrelationships
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 1
Exploring and Organising a Data Set
• Before employing any statistical procedure, develop habit
of looking closely at your data and exploring various ways
of organising them – detect patterns
• Example : reading test scores of 11 children.
Ruth, 96; Robert, 60; Chuck, 68; Margaret, 88; Tom, 56;
Mary, 92; Ralph, 64; Bill, 72; Alice, 80; Adam, 76; Kathy,
84
• What do you see? Arrange and look for patterns
• Careful researchers discover everything possible about
their data, whether the information is immediately useful or
not
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 2
Arranging The Data
• Alphabetical arrangement – look for meaning or pattern, no
matter how trivial – refer to page 246 of the textbook
• Observable fact – highest score was earned by a girl and
that the lowest score was earned by a boy – although silly
and meaningless, it’s an observable fact, and it may come
in handy at a future time
• Symmetrical pattern – same sex arrangement - refer to
page 246 of the textbook
• Separate girls from boys – refer to page 246 of the
textbook
• The graph shows dramatic trends – the girls’ scores
increase as we proceed through the alphabet, and the boys’
scores decrease
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 3
Dramatic Trends
• Not only is there a divergence of trends, the scores are
equidistant from one another. Each score is 4 points either
above or below the preceding one
• May have no relevance to research but it represents
dynamics within the data
• The researcher should be aware of the dynamics, the
phenomena, that are active within the data, whether those
phenomena are important to the purpose of the research or
not
• The astute (clever at seeing quickly how to gain an
advantage) researcher overlooks nothing
• Another arrangement of the data would have probably
made a different meaning more apparent
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 4
Fundamental Guideline for Looking
at the Data
• Whatever the researcher does with the data to prepare it for
inspection or interpretation will affect the meaning that
those data reveal
• Therefore, every researcher should be able to provide a
clear, logical rationale for the procedure used to arrange
and organise the data
• There is no rationale whatever for arranging the test scores
according to the children’s first names
• Had their last names being used, which would have been
equally illogical, we would still have seen that the girls had
higher scores than the boys, but we would not necessarily
have seen the diverging trends depicted in Figure 11.2 on
page 247 of the textbook
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 5
Drawing Conclusions from the Data
• Questions from the example
• Why were all the scores of the girls higher than
those of the boys?
• Why were the intervals between each of the scores
equidistant for both boys and girls?
• Knowledge springs from questions like these
• But must be careful not to make snap judgments
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 6
Drawing Conclusions from the Data
• Even the most thorough research can go astray at
the point of drawing conclusions
• The example – might conclude that girls read
better than boys – not thinking carefully
• Reading is a complex and multifaceted skill
• The data do say that, on a particular day to 11
children, for this particular situation, limited to test
performance in those reading skills, measured
specifically by the test
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 7
Organising Data to Make Them Easier to
Think About and Interpret
• Another example: Joe’s daily grade at high school in the order they
were earned – a simple linear sequence
92, 69, 91, 70, 90, 89, 72, 87, 73, 86, 85, 75, 84, 76, 83, 83, 77, 81, 78,
79
• Arrange the grades in a two-dimensional table – refer to page 248 of
the textbook
• Read vertically the grades on Monday, Wednesday, and Friday – they
are considerably higher than those on Tuesday and Thursday
• Look at successive scores in each column – they get progressively
worse on Mondays, Wednesdays, and Fridays, but progressively
better on Tuesdays and Thursdays
• The researcher should explore the data further – it may have a cause for
behaving that way
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 8
Simple Line Graph
•
•
•
•
•
Look at Figure 11.3 on page 249 of the textbook
We see phenomena that were not readily apparent in
the two-dimensional table
1) Wide disparity in grades during the first and second
weeks
2) The gradual leveling-out process in the third and
fourth weeks
This profile should raise questions and instigate further
exploration in an attempt to explain the erratic
behaviour that the graph reflects
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 9
Choosing Appropriate Statistics
• Use a statistic to take out the jagged irregularities of Joe’s
academic daily performance
• Take weekly average, mean – more commonly called an
average
• Refer to Figure 11.5 on page 252 of the textbook
• Instead of zigzag performance of the graph, it shows very
little change in Joe’s average level of achievement
• The means presented in Figure 11.5 tell us nothing about
how consistent or inconsistent Joe’s grades are in any given
week – probably need a range or a standard deviation – to
summarise the variability we see each week
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 10
Important Points
• Looking at data in only one way yields an incomplete view
of those data and, hence, provides only a small segment of
the full meaning that those data contain
• For that reason, there are many statistical techniques, each
of which is suitable for a different purpose
• Each technique extracts a somewhat different meaning
from a particular set of data
• Every time you apply a new statistical treatment to your
data, you derive new insights and see more clearly the
meaning of those data
• We have hardly exhausted the techniques used to organise,
summarise and better understand Joe’s grades
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 11
Functions of Statistics
1) Descriptive statistics – describe what the data look like
– where their centre or midpoint is, how broadly they
are spread, how closely the variables within the data
are correlated with one another
2) Inferential statistics – make inferences about large
populations by collecting data on relatively small
samples, provide a way of helping us make reasonable
guesses about a large, unknown population by
examining a small sample that is known. In the
process, they also allow us to test our hypotheses
regarding what is true for that large population
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 12
Statistics as Estimates of Population
Parameters
• A parameter is a characteristic or quality of a population
that, in concept, is a constant; however, its value is variable
• Example – circle, radius is a characteristic. In concept, the
radius is a constant: It is always the same for every circle –
the distance from the centre of the circle to the perimeter.
In value, it varies, depending on the size of the circle. The
value is variable
• A parameter is a particular characteristic (e.g., a mean or
standard deviation) of the entire population – which is
sometimes called a universe – about which we want to
draw conclusions
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 13
Considering the Nature of the
Data
• Must consider whether the data
1) Have been collected for single group or,
instead for two or more groups
2) Involve continuous or discrete variables
3) Represent nominal, ordinal, interval, or
ratio scales
4) Reflect a normal or non-normal distribution
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 14
Single-Group Versus MultiGroup Data
• In some cases, a research project yields data about
a single group of people or objects
• In other cases, it may yield parallel sets of data
about two or more groups
• Analysing characteristics of a single group will
often require different statistical techniques than
making comparisons among two or more groups
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 15
Continuous Versus Discrete
Variables
• A continuous variable reflects an infinite number
of possible values falling along a particular
continuum
• A simple example is chronological age
• A discrete variable has a finite and small number
of possible values
• A simple example is a student’s primary school
level
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 16
Nominal, Ordinal, Interval, and
Ratio Data
•
•
•
•
Nominal data are those for which numbers are used only to identify different
categories of people, objects, or other entities; they do not reflect a particular
quantity or degree of something
Ordinal data are those for which the assigned numbers reflect a particular
order or sequence. They tell us that people, objects, or other entities fall along
a continuum in terms of a particular variable; they do not, however, tell us
anything about how great the differences are between them
Interval data reflect standard and equal units of measurement. As is true for
ordinal data, the numbers reflect differences in degree or quantity. But, in
addition, differences between the numbers tell us how much difference exists in
the characteristic being measured. The one limitation of interval data is that a
value of zero (0) does not necessarily reflect a complete lack of the
characteristic being measured
Ratio data are similar to interval data but have an additional feature: a true zero
point. Not only do the numbers reflect equal intervals between values for the
characteristic being measured, but in addition, a value of 0 tells us that there is
a complete absence of that characteristic
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 17
Normal and Non-Normal
Distributions
•
•
•
•
•
Theorists propose that many characteristics of human populations reflect a
particular pattern (refer to the graph on page 255 of the textbook)
This pattern, commonly called the normal distribution or normal curve (also
known as the bell curve), has several distinguishing characteristics:
It is horizontally symmetrical. One side is the mirror image of the other
Its highest point is at its midpoint. More people are located at the midpoint
than at any other point along the curve. In statistical terms, three widely used
measures of central tendency – the mode, the median, and the mean – are the
same
Predictable percentages of the population lie within any given portion of the
curve. If we divide the curve according to its standard deviation, we know that
certain percentages of the population lie within each portion. Refer to Figure
11.6 on page 255 of the textbook
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 18
Choosing Between Parametric and
Nonparametric Statistics
• Parametric statistics are based on two assumptions:
1) The data reflect an interval or ratio scale
2) The data fall in a normal distribution
• If any of these assumptions is violated, the statistical results one
obtains may be suspect
• Nonparametric statistics are not based on such assumptions – data
can be ordinal rather than interval in nature and/or highly skewed in
one direction or the other
• Nonparametric statistics are, by and large, appropriate only for
relatively simple analyses
• For complex and powerful inferential statistical procedures are based
on parametric statistics
• However, some statistical procedures are robust with respect to certain
assumptions; that is, they yield generally valid results even when an
assumption is not met
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 19
Descriptive Statistics – Point of Central
Tendency
• Point of central tendency is a central point around which the data
revolve, a middle point around which the data regarding a particular
variable are equally distributed
• MODE – the single number or score that occurs most frequently,
limited value, not always appear in the middle, and not very stable
from sample to sample – However the only appropriate measure of
central tendency for nominal data
• MEDIAN – the numerical centre of a set of data, with exactly as many
scores above it as below the median point – appropriate for ordinal
data, also for highly skewed data
• MEAN – the fulcrum point for a set of data, balances the weights of all
scores – Mathematically, the mean is the arithmetic average of the
scores within the data set, most widely used, appropriate only for
interval or ratio data
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 20
Curves Determine Means
• Not all phenomena fit bell–shaped pattern. Growth
follows the S curve – refer to Figure 11.9 on page
259 of the textbook
• Growth is a function of geometric progression – so
geometric mean, which is symbolised by Mg,
N
Mg = (X1 )(X2 ) ... (XN )
• It is the way things grow
• Refer to example on page 260 of the textbook
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 21
Measures of Central Tendency as
Predictors
• Measure of central tendency predicts the meaning
of data, where the data converge
• When we speak of “the average citizen”, “the
average student”, and “the average wage earner”,
we are referring to those citizens, students, and
wage earners who are huddled around the point of
central tendency
• In the broad spectrum of possibilities, we are
betting on the average being the best guess as to
what is most characteristic of the total population
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 22
Measures of Variability: Dispersion and
Deviation
• So far, we have been discussing the question, What is the best guess?
• Now we turn to the opposite question: What are the worst odds?
• The more that the data cluster around the point of central tendency, the
greater the probability of making a correct guess about where any
particular data point lies
• The farther the data are dispersed from the central axis, the greater the
margin of predictive error becomes
• Refer to Figure 11.10 on page 261 of the textbook
• The data are more uniform when they cluster about the mean
• As specific data points recede farther from the mean, they lose more
and more of the quality that makes them “average”
• “Peaks and valleys” aid greatly in interpreting the data later on
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 23
How Great is the Spread?
• The simplest measure of variability is the range
• The range indicates the spread of the data from lowest to
highest value: Range = Highest score – Lowest score
• Although the range is easy to compute, it has limited
usefulness as a measure of variability and may even be
misleading if the extreme upper or lower limits are atypical
(not typical, abnormal) of the other values in the series –
e.g. 1, 3, 3, 3, 4, 4, 5, 5, 6, 15 (refer to example on page
262 of the textbook)
• The interquartile range uses the median or mean as a
starting point
• Refer to example on page 262 of the textbook
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 24
How Great is the Spread?
• The standard deviation (σ or s) is the standard measure of
variability in most statistical procedures and the most
accepted index of dispersion
• The formula for a standard deviation is as follows:
s = (X - M)2
N
• The formula for variance, which is another measure of
variability, is simply the standard deviation squared:
s² = (X - M)2
N
• Refer to Table 11.3 on page 263 of the textbook
¯¯¯
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 25
Keeping Central Tendency and
Variability in Perspective
•
•
•
•
•
•
•
•
•
Statistics related to central tendency and variability provide us with a
beginning point from which to view our data
The data must be examined first to observe their configuration and
fundamental characteristics, and then second, through statistical means, we
must analyse the data, seeing (through the eyes of statistical procedures) what
the mass of data appears to be like
We should remind ourselves that statistical manipulation of the data is not
research
Research goes one step further and demands interpretation of the data
In finding medians, means, interquartile ranges, or standard deviations, we
have not interpreted the data, not extracted any meaning from them
We have merely described the centre and spread of the data
We have attempted only to see what the data look like
After learning their basic nature, we can then attempt to say what those data
mean
That is, we can look for the conditions that are forcing the data to behave as
they do
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 26
Measures of Relationship: Correlation
• The statistics we have discussed so far – measures of central tendency
and variability – relate to only a single variable
• Oftentimes, however, we also want to know how two or more variables
are interrelated
• The statistical process by which we discover the nature of relationships
among different variables is called correlation
• The resulting statistic, called a correlation coefficient, is a number
between –1 and +1; most correlation coefficients are decimals (either
positive or negative) somewhere between these two extremes
• A correlation coefficient for two variables simultaneously tells us two
different things about the relationship between those variables:
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 27
Measures of Relationship: Correlation
1) Direction – The direction of the relationship is
indicated by the sign of the correlation coefficient
– in other words, by whether the number is a
positive or negative one. A positive number
indicates a positive correlation: As one variable
increases, the other variable also increases. In
contrast, a negative number indicates an inverse
relationship, or negative correlation. As one
variable increases, the other variable decreases.
Refer to example on page 265
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 28
Measures of Relationship: Correlation
2) Strength – The strength of the relationship is indicated by the size of
the correlation coefficient. A correlation of +1 or –1 indicates a perfect
correlation: If we know the degree to which one characteristic is
present, we know exactly how much of the other characteristic exists. A
number close to either +1 or –1 (e.g., +0.89 or –0.76) indicates a strong
correlation: The two variables are closely related, so knowing the level
of one variable allows us to predict the level of the other variable with
considerable accuracy. In contrast, a number close to 0 (e.g., +0.15 or –
0.22) indicates a weak correlation: Knowing the level of one variable
allows us to predict the level of the other variable, but we cannot
predict with much accuracy. Correlations in the middle range (for
example, those in the 0.40s and 0.50s, positive or negative) indicate a
moderate correlation. Refer to example on page 265. Refer to Table
11.4 on page 266 of the textbook for examples of correlational statistics
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 29
A Reminder About Correlation
•
•
•
•
•
•
•
Whenever you find evidence of a correlation within your data, you must
remember one important point: Correlation DOES NOT necessarily indicate
causation
For example, if you find a correlation between motivation and classroom
achievement, you cannot necessarily conclude that students’ motivation
influences their achievement
Only experimental studies, such as those described in Chapter 10, allow you to
draw definitive conclusions about the extent to which one thing causes or
influences another
Finding a correlation in a data set is equivalent to discovering a signpost
That signpost points to the fact that two things are related, and it reveals the
nature of the relationship (positive or negative, strong or weak)
It should then lead you to wonder, What is the underlying cause of the
relationship?
But the statistics alone will not be able to answer that question
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 30
Interpreting the Data
•
1)
2)
Interpreting the data means several things. In particular, it means:
Relating the findings to the original research problem and to the
specific research questions and hypotheses. Researchers must
eventually come full circle to their starting point – why they
conducted a research study in the first place and what they hoped to
discover – and relate their results to their initial concerns and
questions
Relating the findings to preexisting literature, concepts, theories,
and research studies. To be useful, research findings must in some
way be connected to the larger picture – to what people already
know or believe about the topic in question. Perhaps the new
findings confirm a current theoretical perspective, perhaps they cast
doubt on common “knowledge”, or perhaps they simply raise new
questions that must be addressed before we can truly understand the
phenomenon in question
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 31
Interpreting the Data
3)
4)
Determining whether the findings have practical
significance as well as statistical significance. Statistical
significance is one thing; practical significance –
whether findings are actually useful – is something else
altogether.
Identifying limitations of the study. Finally, interpreting
the data involves outlining the weaknesses of the study
that yielded them. No research study can be perfect, and
its imperfections inevitably cast at least a hint of doubt
on its findings. Good researchers know – and they also
report – the weaknesses along with the strengths of their
research
SAK 5090 MOHD HASAN SELAMAT- chapter 10
Slide 32