Data Analysis and a Brief Intro to Stats for Writers
Download
Report
Transcript Data Analysis and a Brief Intro to Stats for Writers
Data Analysis and a Brief
Intro to Stats for Writers
For English 8125
By Dr. Bowie
A Brief Overview of Stats for
Writers: Central Tendency
•
•
Principle 1: “The smaller the variance in data, the more relievable the inference”
(Hughes and Hayhoe 62)
Mean: or average; the most commonly used measure of central tendency
–
•
•
Mode: most frequently obtained measure. Example: 3, 4, 4, 4, 5, 5, 6, 6, 7. 4 is
the mode. This is a rougher measure and can be used to misrepresent the data.
Median: middle score/measure corresponding to the 50th percentile.
–
–
•
•
Average the numbers (add them all up divide by n)
If odd number of scores: it is the middle score once sorted into ascending /descending
order
If even number it is the average of the two middle scores
Range: The differences between the highest score and lowest. R=Xmax-Xmin.
Standard Deviation: how close the set of data is to the mean. Smaller standard
deviations means a tighter more precise data group, and larger deviations means
the data is more spread out.
–
The formula:
Image from Process Dynamics and Controls Open Textbook
Luckily Excel can do this for us!
Standard Deviation
•
•
•
•
•
How close the set of data is to the mean. Smaller standard deviations means a
tighter more precise data group, and larger deviations means the data is more
spread out.
One standard deviation contains about 68% of the population
Two standard deviation contains about 95% of the population
Three standard deviation contains about 99.7% of the population
The formula:
Image from Process Dynamics and Controls Open Textbook
•
With a normal distribution (bell curve)
Image
http://en.wikipedia.org/wiki/File:Standard
_
deviation_
diagram.svg
The t-Test
• The t-Test: the comparison of sample populations to
determine if there is a significant difference between their
means. The result is a ‘t’ value used to find the p-value.
– One-tailed: hypothesis states the direction of the difference or
relationship: Students will score higher on assignment 1 than 2.
Women will run faster marathons after X training then men.
– Two-tailed: hypothesis states there is a difference, but not the
direction of the difference. Students will score higher on assignment
one of the two assignments. After X training men and women will
have different average marathon times.
– The one-tailed probability is half the value of the two-tailed
probability.
– Three types of t-tests in Excel:
• Type 1: One group of participants before and after treatments.
• Type 2: Two groups of participants with equal variances (standard
deviation about the same)
• Type 3: Two groups of participants with unequal variances (standard
deviation not about the same)
Probability and Confidence
•
•
P value (p): the probability of getting the results by chance. Confidence
levels of the data being “real.” p is often “acceptable” a 0.05, 0.01, or
0.001, depending on how conservative you want to be and other factors.
Confidence Interval: describes reliability. It is a range of plausible values
or range of probabilities within which the true probability would lie q
certain percentage of the time (normally 95% or 90%).
– Narrow: Implies high precision—small range of plausible values. More reliable.
– Wide: Poor precision, the range is broad and uninformative.
– Also “provides a way of determining whether the sample is large enough to
make the trial definitive. If the lower boundary of a confidence interval is
above the threshold considered clinically significant, then the trial is positive
and definitive, if the lower boundary is somewhat below the threshold, the
trial is positive, but studies with larger samples are needed.”
(http://www.cmaj.ca/cgi/content/abstract/152/2/169)
– Add/subtract the confidence interval value from the mean to find the
confidence interval range
Correlations
• Correlation: a measure of the relation
between two or more variables. Correlation
coefficients range from -1.00 to +1.00, with
-1 a perfect negative correlation and +1 a
perfect positive, but 0.00 is a lack of
correlation.
Analysis of Variance (ANOVA)
•
•
•
•
•
The purpose of (ANOVA): to test for significant differences between
means (which means comparing variances, thus the name). If we are only
comparing two means, then ANOVA gives the same results as the t-test.
The ANOVA produces an F statistic, the ratio of the variance among the
means to the variance within the samples.
One-way ANOVA: for differences among two or more independent
groups, typically 3 or more, as the t-test covers 2. Example: The times of
masculine, feminine, androgynous, and undifferentiated genders in
completing a task.
Factorial ANOVA: for the effects of two or more treatment variables.
Most common is the 2×2 with two independent variables and each
variable has two levels or distinct values. Can be multi-level, such as
3×3, or higher order, such as 2×2×2. Example 2x2: Female and Male
scores before and after the treatment.
Multivariate analysis of variance (MANOVA): for when there is more than
one dependent variable. Example: Student scores in audience analysis and
grammar after using a website or a textbook.
To do in Excel: You need the Analysis ToolPak add-in
Qualitative Data Analysis
•
•
•
•
Top down: start with categories from theory, literature, hypotheses, your topics.
This is considered “more rigorous” by more empirical researchers. It is not “biased”
by the data. Best when you know what you are looking for.
Bottom up: develop your codes after the research during the analysis; see what
codes develop. This may result in more natural and reflective coding. Best for
exploratory research.
Both: Obviously you can use a bit of both
Coding: Can do manually or with software. Look for:
–
–
•
Themes, Topics, Ideas, Concepts
Terms/phrases or Keywords
Also consider developing codes for
–
–
–
–
–
–
–
Setting and context
Participant perspective
Process codes
Activity codes
Strategy codes
Relationships and social structure
Reassigned coding (all from Creswell 193, drawing on Bogdna & Biklen)
Coding
• Develop a coding system
– Make it flexible and easy to use
– Create a “memo” or resource with code definitions to refer to, add
to this as new codes develop
– Consider “quantifying” where possible
• Figure out what and how you will code
– Whole texts?
– Passages, lines, words?
• Include the coded material in with coding
– Copy and paste or link to in a cell in your spreadsheet (time in track
for audio or video recordings)
– Copy the print text and do old fashioned copying and pasting, or
highlighting, and put the material in folders by code
Some Data Analysis Methods
• Affinity diagram: method for sorting all the idea/points/items
collected into groups and clusters, often resulting (but not
required) in a hieratical diagram showing scope
• Work Models: provide graphical, concrete, systematic view
of work (or other) practice
–
–
–
–
Flow: Shows how work is broken up across people, keeping track of individuals,
responsibilities, groups, flow, artifacts, communication topics or action, places, and
breakdowns,
Sequence Model: Maps the sequence of work including intent, triggers, steps, order,
and breakdowns
Artifact Models: shows the interpretation of the conceptual distinctions of use of
artifact including information, parts, structure, annotations, presentation, usage,
breakdowns
Cultural Model: maps out the intangible forces of culture including influencers and
influences
Coding Examples
This is just the start
Have fun & analyze well!