Transcript Week #1
Action Research
Introduction
INFO 515
Glenn Booker
INFO 515
Lecture #1
1
Course Scope
This class focuses on understanding
common types of analysis techniques
which may be used to support research
projects
We will use the statistics program SPSS
to manipulate data and generate graphs
There will be weekly homework
assignments for much of the term
INFO 515
Lecture #1
2
Who cares…
…about statistics and research methods?
INFO 515
Commonly accepted techniques need to be
used to ensure that valid comparisons and
analyses are being made
Statistics is a common language to
express results
Helps ensure that objective conclusions
are reached
Lecture #1
3
Why use SPSS?
Microsoft Excel is adequate for simple
math (arithmetic, averages, etc.)
But Excel fails some standard tests for
performing more advanced calculations
(regression analysis, etc.)
SPSS was chosen for its widespread
usage and low cost student version
INFO 515
Lecture #1
4
My Background
Eighteen years of industry experience
DOD (Department of Defense) and FAA
(Federal Aviation Administration) work,
primarily involved in software development,
systems engineering, and project management
Also teach statistical process control for high
process maturity organizations
Have been teaching for Drexel since 1998
INFO 515
Lecture #1
5
For the REAL serious student
Get the ISO Standards Handbook “ISO
Statistical methods for quality control”,
5th ed., 2000
It runs $418 for both 700+ page volumes
No, I don’t expect you to buy this!
If you do find someone to buy it for you,
search for its title at http://global.ihs.com/
IHS is a great, if terribly expensive, source for
military (MIL, DOD), industry (IEEE, ASTM), national
(ANSI, DIN*), and international (ISO) standards
* DIN is the German equivalent of ANSI
INFO 515
Lecture #1
6
Other References
More realistically, see my handout
“Statistics for Software Process
Improvement”
INFO 515
It summarizes statistical terms, hypothesis
testing, SPSS tips, and other stuff we’ll
be using
We’ll use it a lot
Lecture #1
7
Definitions
Data - observations collected in order to
measure or describe a situation or
problem of interest
Data describes a variable
Variables - are objects or concepts that
must have a value or a definition assigned
to them in order that they can be
measured and analyzed
INFO 515
They take on different values for individuals
and groups
Lecture #1
8
Discrete vs. Continuous Data
Discrete data can take on only a finite
number of values. It is often
characterized by counting units (integers),
or only specific values, like grades
Continuous data can take on an infinite
number of possible values and is
characterized by some type of
measurement, instrument, or scale
INFO 515
You measure height, weight (Does anyone ever
know exactly how much they weigh?), speed,
etc.
Lecture #1
9
Definitions
Theory is a possible explanation of the
relationships among variables
Research Hypothesis – as a
consequence of our theory, the hypothesis
is the statement we submit to testing
Often states there is a pattern, or difference,
or trend among the variables
Null hypothesis is the opposite of the
research hypothesis
INFO 515
States there is no trend or difference
Lecture #1
10
Research
Research describes what or explains why
It is a method for finding answers to
questions or a strategy for explanation
Research is:
1.
2.
3.
INFO 515
Empirical, because it is based on evidence
or data
Systematic, because it uses a method
Objective, because it is presumably
conducted and interpreted by the researcher
without bias
Lecture #1
11
Basic vs. Applied Research
Basic research usually refers to
laboratory research, such as
experimental psychology
INFO 515
In basic research, the researcher is testing
theory and ideas without necessarily
applying the results to practical problems
Lecture #1
12
Basic vs. Applied Research
Applied research is also called field
research, evaluation research, or action
research
INFO 515
This type of research is often used to
influence policy and decision-making, and is
conducted to solve problems (often
immediate problems), sometimes only within
one organization (hence its results are only
applicable to that organization)
Lecture #1
13
Quantitative vs. Qualitative
Quantitative Research tends to deal with
variables that have numeric values
How far do you commute to work?
How tall are you?
Qualitative Research looks at variables
which are binary (Yes/No), have
non-numeric values, or are free-form text
INFO 515
What is your favorite football team?
How could I improve this slide?
Lecture #1
14
The Nature of Qualitative and Quantitative
Research Strategies:
Difference is the type of data you collect
and the tools you employ
Specifically—
INFO 515
The same data collection strategies can
be qualitative or quantitative
Qualitative data can become quantitative
Pure quantitative data cannot become
qualitative
Often in research, it is good to use
qualitative and quantitative in the same
study
Lecture #1
15
Research Methods
There are many different ways to
conduct research
Exactly how many ways depends on
your field of study and how you wish
to define them
Here we break them into nine different
methods (see narrative lecture notes too)
INFO 515
Lecture #1
16
1. Historical Research
Reconstruct the past to support a
hypothesis or theme, while remaining
objective and true to the actual events
which occurred
Example: study past software projects to
see if it’s true that: “if a project was at
least 10% behind schedule halfway
through, it will finish at least 10% late”
INFO 515
Lecture #1
17
2. Descriptive Research
This is a non-judgmental type of research
Examine a situation or area systematically
and describe it
Example: study how library patrons
navigate when looking for a particular
book
INFO 515
Lecture #1
18
3. Developmental Research
Examine how something grows or changes
over time; is also non-judgmental
Often looking for processes, patterns,
or sequences
Example: study the number of software
requirements which have been described
during a project, and look for that number
stabilizing (not changing much)
INFO 515
Lecture #1
19
4. Case and Field Research
Study a given organization to understand
how it faces its environment
Often used for understanding business
management decisions – in a given
business environment, how did they
choose among product development
options?
INFO 515
Lecture #1
20
5. Correlational Research
Study how one variable is affected by
one or more other variables
Example: how is customer satisfaction
affected by product reliability?
Another example: how is productivity
affected by the level of experience of
the workers?
INFO 515
Lecture #1
21
6. Causal Comparative
A.k.a ex post facto (after the fact)
research
Study some outcome by looking for
possible causes
Example: determine if listening to
classical music leads to criminal activity
Or: determine if being short increases
your chance of having a heart attack
INFO 515
Lecture #1
22
7. True Experimental Research
Examine the effect of some treatment on
an experimental group by comparing it to
a control group which receives no
treatment (e.g. a placebo)
Example: drug studies are done this way
to prove whether the drug really had a
noticeable effect on the patients
INFO 515
Lecture #1
23
Experimental Study “Blindness”
A single blind study means the testers
know which subjects receive the real
treatment, but the subjects don’t know
A double blind study means neither side
knows who received the real treatment –
the information is coded so that only the
analysts can figure out who received what
INFO 515
Side note: If the subjects know what they are
receiving, the study isn’t blind at all
Lecture #1
24
8. Quasi-Experimental Research
This is like True Experimental Research,
but is done where you can’t control all of
the variables (such as the real world)
Much software development research is
in this category
Much qualitative research is in this
category too
INFO 515
Lecture #1
25
9. Action Research
Develop new ways to solve problems with
direct application to the real world
This tends to focus on your own
organization: study what’s happening,
and see how to improve it
INFO 515
Lecture #1
26
Action Research
A strategy in Educational Research
Enables problem solving in the natural
setting
Participatory action research
Connect theory with practice
INFO 515
Lecture #1
27
Action Research Questions in Library
and Information Science
How much does the library spend?
How much do potential users actually use
the library?
How productive is the library staff?
Is the staff the right size?
How are users served by the library?
INFO 515
Lecture #1
28
Statistics
Statistics describes a likely range for
predicting something, not a fixed point
For example, instead of saying it will take
“a week” to perform a task, describe a
time period in which you are likely to
finish the task, such as 7 days +/- 2 days
Most people don’t like to think this way uncertainty makes people uncomfortable
INFO 515
Lecture #1
29
General Function of Statistics
Descriptive Statistics describes the
characteristics of one or more variables
We describe the traits of that variable
Inferential Statistics is used when we
develop a hypothesis, and analyze data to
make decisions or draw conclusions about
that hypothesis
INFO 515
We infer some larger perspective or
understanding, based on our limited data
Lecture #1
30
General Function of Statistics
Descriptive
Numbers that describe situation of interest
Value: efficient summary of data
Interpretive (Inferential)
INFO 515
More power, but certain amount of risk
Hypothesize, then collect data and analyze it
Accept or reject the hypothesis
Lecture #1
31
Definitions
Independent Variable - A variable which
is thought to influence another variable
Often plotted as the ‘X’ axis on a graph
Might have many independent variables
Dependent Variable - A variable which is
influenced by or is the consequence of the
independent variable
INFO 515
Often plotted as the ‘Y’ axis on a graph
Y
Lecture #1
X
32
Independent vs. Dependent
Generally speaking, we want to be able to
understand and/or predict the dependent
variable in a problem
Often a hypothesis will try to use one or
more independent variable(s) to explain
the behavior of the dependent variable
INFO 515
We want to understand IQ (dep variable); try
to see if income predicts it (indep variable)
To improve customer satisfaction (dep), see if
a new card catalog (indep event) changes it
Lecture #1
33
Cases and Variables
Cases = units of analysis
people, things, records, etc….
A.k.a.: entities, respondents, subjects, items
Become the rows in your data matrix
Variables = things that vary! (not
constant)
INFO 515
Example: Achievement, Intelligence,
Attendance, Income, Aggression
A.k.a.: measures, attributes, features
Become the columns in your data matrix
Lecture #1
34
Variables
Discrete = Counting Units
Continuous = Measurement
Example: Intelligence Tests
Independent Variables
Example: Attendance
influences other variables
Dependent Variables
INFO 515
influenced by (or consequence of) the
independent variable.
Lecture #1
35
Definitions
Population (N) is the total group of
things under study, such as all voters in
an election
Sample (n) is a subset of the population
Basic descriptive statistics include
Maximum is the largest value in a data set
Minimum is the smallest value in a data set
Range is the difference between the Maximum
and the Minimum
INFO 515
Range = Maximum - Minimum
Lecture #1
36
Sample & Population Variables
Notice that very often, the same variable
will have a different symbol for its value
for a sample, than its value for the entire
population (more examples to follow)
This helps distinguish between what we
have measured directly (usually the
sample variable), but we want to
understand or predict that variable for
the whole population
INFO 515
Lecture #1
37
Measures of Central Tendency
There are three measures of “central
tendency”
Mean
Median
Mode
They convey the average, middle, and
most common values in a data set
INFO 515
Lecture #1
38
Definitions
Mean - The average of a set of data;
equal to the sum of their values (Xi),
divided by the number of data points (N).
Mean is X (X bar) for a sample, or m
(Greek mu) for the entire population
N
Mean = S Xi
i=1
N
INFO 515
For some set of data with N values;
add them up and divide by N.
To be precise, this is the arithmetic
mean; there are other kinds, e.g.
geometric mean.
Lecture #1
39
Definitions
Median is the middle value of a set of
data which has been sorted in numeric
order (e.g. the median home selling price)
If the set has an even number of data points,
average the middle two values
Mode is the value of data which occurs
the most often (generally for integer
data sets)
INFO 515
There can be one mode or many, resulting
in different mode types
Lecture #1
40
Mode Types
Unimodal - there is one mode in a data set
Bimodal – there are two modes in the
data set
Multimodal - there are many (>2) modes
in the data set
INFO 515
If there are no duplicates in the data set
(all values are unique), then all its values
are modes, hence it would be extremely
multimodal!
Lecture #1
41
Definitions
Standard deviation (s for sample, or
s (sigma) for population) represents the
average amount data differs from
the mean
Standard deviation affects the width or
flatness of the bell shaped curve
Variance (s2 or s2) is the standard
deviation squared
INFO 515
Lecture #1
42
The Normal Distribution
We’ll look at this more later on…
Normal Distribution for mean = 0, and std dev = 1/2, 1 and 2
0.9
0.8
0.7
PDF
0.6
PDF (std dev=1)
0.5
PDF (std dev=2)
0.4
PDF (std dev=1/2)
0.3
0.2
0.1
0
-8
-6
-4
-2
0
2
4
6
8
X
INFO 515
Lecture #1
43
SPSS
SPSS is high end statistical analysis software
You can use your Drexel login to download it free
from https://software.drexel.edu/
Log in with drexel\ in front of your login name, e.g.
"drexel\abc28" and the same password you use for
DrexelOne. Navigate to find SPSS version 16, something
like https://software.drexel.edu/Students/PCSoftware/SPSS/SPSS16/.
Make sure to save the readme.txt file too - it has the
serial number and Authorization Code information.
Download and run the executable file.
Version 16 for Mac (~730 MB file)
Version 16 for PC (~ 670 MB files)
Anything version 10 or later is acceptable
INFO 515
Lecture #1
44
SPSS Introduction
SPSS is like a spreadsheet or flat
file database
Limits for
Student
Edition only
Each variable has its own column (max. of 50)
Each record has its own row (max. of 1500)
Key navigational feature:
INFO 515
Use the Data View tab to see the
experimental data
Use the Variable View tab to see the
characteristics of each variable and how
they’re displayed in the Data View
Lecture #1
45
SPSS Data View
INFO 515
Lecture #1
46
SPSS Variable View
INFO 515
Lecture #1
47
SPSS Introduction
Use the Variable View tab to change the
characteristics of each variable, such as
Type of variable (integer, date, text, etc.)
Name of each variable, which was limited to 8
characters, is lower case, and has no spaces
Labels for each variable are optional, but they
allow a more useful identifier than the Name
INFO 515
Recent versions finally removed the 8 character limit
When you select or plot a variable, its Label is
shown (if there is one), not its Name
Width is how many digits or characters the
variable may have
Lecture #1
48
SPSS Introduction
Variables can have a limited set of
allowable Values, such as {0 = Male},
{1 = Female}
Sort data by selecting Data / Sort Cases…
INFO 515
Then select one or more variables to be the
“Sort by:” criteria
If more than one variable is selected, data will
be sorted in that order of precedence
Lecture #1
49
SPSS Introduction
Can adjust column widths like Excel
In Data View, move cursor between column
titles (which are the variable Names), and drag
the column width left or right, or
In Variable View, edit the Columns field
SPSS data files have an extension of “sav”
Output is saved separately in files with an
extension of “spo”
INFO 515
Tabular output of ***** means the column is
too narrow; double click to edit, and drag the
right edge of the column to the right
Lecture #1
50
Additional References
From Prof. Val Yonker
Carpenter, R.L., and Vasu, E.S. (1979). Statistical Methods
for Librarians. Chicago: American Library Association.
Cohen, J. and Cohen, P. (1975). Applied Multiply
Regression/Correlation Analysis for the Behavioral Sciences.
Hillsdale, NJ: Lawrence Erlbaum Assoc.
Hernon, P. (1989). A Handbook of Statistics for Library
Decision Making. Norwood, NJ: Ablex Publishing.
Isaac, S. and Michael, W.B. (1977). Handbook in Research
and Evaluation. San Diego: Edits Publishers.
Keppel, G. (1973). Design and Analysis: A Researcher's
Handbook. Englewood Cliffs, NJ: Prentice-Hall.
Kerlinger, F.N. (1979). Behavioral Research: A Conceptual
Approach. New York: Holt, Rinehart, and Winston.
INFO 515
Lecture #1
51
Additional References
Loether, H.J. and McTavish, D.G. (1980). Descriptive and
Inferential Statistics: An Introduction. Boston: Allyn and
Bacon.
Runyon, R.P., and Haber, A. (1984). Fundamentals of
Behavioral Statistics (2nd ed.). Reading, MA: AddisonWesley.
Selltiz, C.; Wrightsman, L.S.; and Cook, S.W. (1976).
Research Methods in Social Relations (3rd ed.). New York:
Holt, Rinehart and Winston.
Here’s my favorite:
Salkind, Neil J., (2007) Statistics For People Who (Think
They) Hate Statistics (3rd ed.). Thousand Oaks, CA: Sage
Publications. ISBN: 9781412951500
INFO 515
Lecture #1
52