Internet data - Library and collections

Download Report

Transcript Internet data - Library and collections

F2: Data Beyond Numbers:
Using Data Creatively for
Research
Room: Lachine
Session Chair: Mary Luebbe, University of British
Columbia
IASISST
2007
Montreal
1
kbr
The truth is out there
Karsten Boye Rasmussen
associate professor, [email protected]
Department of Marketing and Management
University of Southern Denmark
Campusvej 55, DK-5230 Odense M, Denmark
+45 6550 2115 fax: +45 6593 1766
organization and information technology
business intelligence
strategic organization design
'it, communication and organization' www.itko.dk
IASISST
2007
Montreal
2
kbr
Editor of the IASSIST Quarterly
The data
truth is out there
The





Point of departure
Use of data => documentation
Use of Internet for research
Relying on the Stimulus-Response contract
The potential of utilizing non-reactive data:
e-mails
blogs
web-logs (on hits, visits, users, etc.)
web-sites and links
paradata
IASISST
2007
Montreal
3
kbr

Mixing methods

Having complete, reliable, and accessible resources
Data, use, metadata and documentality
"if information is data plus context, knowledge is information
plus experience" (Levy & Powell, 2005)
data is description - of the world or objects in the world
description of data - is metadata
DDI 'The Data Documentation Initiative'

The quality measures of validity, reliability, accuracy,
precision, bias, representativity, etc.
only available through the documentation of the data
the metadata
IASISST
2007
Montreal
4
kbr

High documentality means the dataset is 'pattern' &
'model'
Errors in survey data

Acquirering primary data
survey is the "ability to estimate with considerable
precision the percentage of a population that has a
particular attribute by obtaining data from only a small
fraction of the total population" (Dillman, 2007)
Sampling error
Surveying only some not all the population
Coverage error
Not an equal or known chance of being sampled
Measurement error
Bad instrument, poor question wording
Non-response error
Respondents being different from the non-respondents
Table 5. The four sources of survey error (Dillman, 2007:9-11)
IASISST
2007
Montreal
5
kbr
Internet & research


a shift in the medium for data collection
self administered internet surveys
web surveys
e-mail surveys
e-mail with links
web
e-mail
the link points to a web-questionnaire
a mixed-mode within the Internet media
e-mail with attached questionnaire
the questionnaire in software formats (Word)
e-mail text without attachments or links - answering mail
3-5 questions

IASISST
2007
Montreal
6
kbr
PLUS
non-reactive data
Web survey - some problems
respondents have uneven accessibility to the Internet
unevenness in regard to the technical abilities:
bandwidth, computing power, and software (webbrowsers)
however general web-site competences do exist
and telephone ownership is now too widespread
- an other medium is needed
IASISST
2007
Montreal
7
kbr
Web survey - the many pros
some reliable e-mail registers do exist
random selection - but not randomly generated ;-)
CAxI (Computer assisted telephone interviewing)
more complicated structures possible in the answering
software will enforce consistent rule following
experiments using different sequencing of questions
the use of paradata in web (later)
IASISST
2007
Montreal
8
kbr
Web survey - the respondent

Internet coverage, sampling,
and the right respondent
sampling is not secured by a large number of
respondents
the problem of self-selection
a systematic bias
have to secure the right - or at least only one respondent on the inquiry
the new problem of a 150 per cent answer rate
log-in procedure with a PIN-code is recommended
IASISST
2007
Montreal
9
kbr
Web survey - success and hazard
quicker turnaround than through the postal or face-toface questionnaire
raising the data quality by securing timely data
the Internet surveys have a much lower 'marginal cost'

IASISST
2007
Montreal
10
kbr
with the Internet and supportive software for
web surveys
many more surveys are taking place
maybe too many
respondents tend to be more reluctant to participate in
surveys
low response rates
as shown in surveys 
Secondary data – a richness of data

The data is already out there - ready to use
data is being made available and retrievable
raising the data quality through a higher
documentation level
... a long list ... of pros for secondary data

for some areas the complete data is available
as the data in the operational system of the company
who bought what, when, and where?
IASISST
2007
Montreal
11
kbr
the electronic traces left by the human behavior
Online behavior / traces / Non-reactive

Investigating the sources
e.g. e-mails

e-mail fields
sender, date, subject, response - a network
IASISST
2007
Montreal
12
kbr

content analysis

e-mail
Sender as node
Receiver as node
Response and initiator on a web-list
Subject as id of a thread
Link Analysis – Graph Theory

nodes & edges
node: has names and
properties: phones,
doctors, web-pages, emailers
edge: pair of nodes
connected by a
relationship
often communication
IASISST
2007
Montreal
13
kbr

fully connected graph

path:
an ordered sequence of
nodes connected by
edges
Online behavior / traces / Non-reactive

Investigating the sources
e-mails (just mentioned earlier)
sender, date, subject, response - a network
blogs
the web-sites and the web
all these have ethical as well as legal implications (Allen)
IASISST
2007
Montreal
14
kbr

Research into the virtual

Logs of behavior
web-log
paradata
ISP-log (internet service provider)
Web-log analysis
hits, pages, visits, users of a web-site
Host
Time
Request
Status Bytes
IP address
IASISST
2007
Montreal
15
kbr
133.225.107.171
--
[04/Jan/2007:06:29:24 -0700]
"GET /home/ HTTP/1.0"
200
2935
133.225.107.171
--
[04/Jan/2007:06:29:32 -0700]
"GET /home/pubs.html HTTP/1.0"
200
1204
133.225.107.171
--
[04/Jan/2007:06:29:37 -0700]
"GET /home/iq.html HTTP/1.0"
200
2516
133.225.107.171
--
[04/Jan/2007:06:29:37 -0700]
"GET /home/getacro.gif HTTP/1.0"
200
1090
cookies and explicit user log-in
'click-stream analysis' CLF (Common Log File)
pages where the session stops?
patterns of web-movements that explain the stops
going in circles on a web site?
behavior from non-buyers and buyers
Paradata in surveys

web-log of the process of answering a web survey
timing of the respondent's progression in shifting the web
page
paradata is data about the process of data collection
(Couper)

collection at the client-side (Heerwegh)
JavaScript can trace - with timing - different types of
answering mechanisms:
drop-down lists
radio-buttons
click-items
give value etc.
IASISST
2007
Montreal
16
kbr

and client-side can also track how the respondent has
changed the answers
Med fokus på kunden
IASISST
2007
Montreal
17
kbr



Customizing
speciallavet til kunden, kundens ønske
mass-customization





Personalizing
opbygget efter kundens behov
som automatisk aflæst
gennem kundens transaktioner
og andre kunders transaktioner

Discriminizing?
Analyzing virtual communities

Amazon
first among communities of costumers
making other customer comments and evaluations available
using their behavior – linking the books bought and customers

many more sites of communities
e-mail lists
blogs
dating sites
potential in personal links as in Linkedin.com
and in the constructed virtual reality of 'Second Life' …
IASISST
2007
Montreal
18
kbr
or the links contained in the web itself
Network

IASISST
2007
Montreal
19
kbr
Live
Mixed modes and mixed methods
modes of surveys with questionnaires
postal, with interviewer, face-to-face or telephone, or
web-mode
mixed-mode has the ability to reduce non-response
'sequential mixed-mode ... do not pose any problems'
(de Leeuw)
but different modes often produce different results
(Dillman)
the 'unimode design'
later a mode-specific design taking full advantage of the
mode used

IASISST
2007
Montreal
20
kbr
'mixed methods' more the combination of qualitative and
quantitative methods - and S-R and non-reactive data
Conclusion
more data is out there
as traces of actual behavior
get it!
IASISST
2007
Montreal
21
kbr
IASISST
2007
Montreal
22
kbr

Got it?

Thanks


Karsten Boye Rasmussen
SDU
Abstract

IASISST
2007
Montreal
23
kbr
The Internet has presented a welcome media for the
traditional research as found in Internet surveys. The
price of conducting surveys has gone down and the
following higher frequent occurrence of surveys has also
brought the focus to some of the central drawbacks of
conducting surveys. We are continuously battling the
challengers for the validity of the results obtained
through the survey design because of the bias present
in low response rates. This presentation will exemplify
how the electronic traces of human behavior supplies a
new area of valid non-reactive data that adds complete,
reliable, and easily accessible resources for analysis.
Both e-mails and blogs can be the basis for content
analysis as well as for structural or network analysis.
The electronic traces of behavior exemplified by clickstreams of web behavior can be used stand alone or
enhance web-surveys through paradata. Lastly, the
Internet itself presents an area of research.