Transcript Findings

Catherine C. Marshall
Akshay Kulkarni

Explores practices associated with
◦ Collaborative Authoring
◦ Reference Use
◦ Informal Creation of Personal Archives

In this paper, the author discusses findings in
four areas:
◦ (1) Collaboration and writing
◦ (2) File storage and management
◦ (3) Maintaining and extending bibliographic
resources
◦ (4) Personal archiving of scholarly material.
Discipline
Number of participants
Algorithms and Theory
3
Distributed Systems
10
Security and Privacy
5
Software Tools
3
Web Search and Data
Mining
3






Observations over 6 months.
Interviews of 45 minutes to one hour.
Participants in different phases of their
careers.
Some professional participants along with
academic participants.
All participants published regularly.
Collected relevant data such as Web-based
publication list, cirricula vitae etc.
1. Collaboration and Writing

Roles in writing
◦
◦
◦
◦
◦
◦
Co-ordinate writing by parceling out sections
Write entire draft by themselves and not cede the
text to co-authors until fully formed
Participants in same place write shoulder-toshoulder in front of a display
If not in same location then synchronous
collaboration.
Sections split according to skill.
Final co-author just reviews work and results but
does not have access to data-sets.
1. Collaboration and Writing

Places and Devices
◦
◦
Noise, display size, firewalls, presence of
colleagues – all these factors affect where
participants write.
Examples: Academic office, corporate office,
conferences etc.
Similarly different computers are used for
different activities
Examples: Number crunchers, email machines,
backup hosts etc.
1. Collaboration and Writing

Tools
1. Editors, figure generating tools, document
preparation software (LaTeX)
2. Analysis software to process data
3. Infrastructure software such as email
4. Custom software used specifically to produce
specific results for the paper.
2. Storing and Managing Materails

Versioning
Participants use sophisticated versioning systems
to:
1. Compare successive revisions
2. Merge two conflicting versions
3. Create personal backups
4. Create meaningful checkpoints in the lifecycle
5. Record development of ideas
2. Storing and Managing Materails

Managing Data
1. For 9 out of 14 participants creating, gathering,
analyzing and presenting data was a fundamental
part of research.
2. For small datasets different ways are used to
store data such as CSV files, descriptive metadata
files, logs and transcripts in emails, Microsoft
OneNote
3. Large datasets have a lifecycle of their own. They
may not be backed up, let alone archived.
3. Bibliographic Resources

Maintaining local resources
◦
◦
◦
◦
Participants often cited LaTeX’s bibliographic
capability – BibTex to prepare publications.
Participants build them over time and use them
for intellectual bookkeeping.
Participants extend citations with comments,
summaries, notes, tags or abstracts to help
remember.
Inherit previous bib files for projects on the same
topic.
3. Bibliographic Resources

Non-traditional resources
Participants consider CiteSeer as a useful resource
for doing citation following.
Participants use Google to get exact citation and
then go to digital libraries to get the paper.
Online resources are considered based on
◦
◦
◦




Coverage
Authority
Scope
Timeliness
4. Personal Archiving

Contents of a personal archive
Personal archiving is a side effect of collaboration
and publication.
Efforts to maintain intellectual legacy:
◦
◦






Paper sources and alternate versions
PDFs of published versions
Research code
Data, logs and scripts
Bibliographies and publications of related work
Emails
4. Personal Archiving

How materials are stored
◦
◦
◦
Bundle related files together.
Establish temporal order and intellectual context.
Easily maintainable
◦
Email is cited as a good permanent store because:
◦



Easy to browse chronologically
Intrinsic metadata to reconstruct data
Accessible easily from anywhere
Zipping up files is another established archiving
technique.

Implications for Collaborative Information
Management (CIM):
1. Bundle together heterogeneous files, datasets and
other items.
2. Support reference replica.
3. Support email like documentation and
chronological organization.
4. Support collections including data sets too large
to be copied.
5. Support synchronization when co-authors
straddle an institutional firewall.

Implications for personal scholarly archives
◦ Archives must be able to be disentangled when the
scholar changes affiliation.
◦ A master bibliography is important.
◦ Answer the question whether all the resources
involved are archival?
 For example intermediate data sets are not archived.

Implications for institutional and disciplinary
repositories
◦ Reduce the overhead of how the researcher send
data to publisher as well as other institutional
repositories.
◦ Repository quality should be improved without
adding great deal of unwanted overhead.