080415_EGU_TagWikiPoster_pieces

Download Report

Transcript 080415_EGU_TagWikiPoster_pieces

Harmonization and Integration of SemiStructured Data Through Wikis and Controlled
Tagging
E. M. Robinson, R. B. Husar
Washington University, St. Louis, MO
Abstract:
The contents of cyberspace are increasingly generated and distributed by individuals.
This is manifested by the explosive growth of web-based social software like wikis,
media-sharing services and blogs. This architectural, technological and cultural
transformation of the Internet, commonly referred to as Web 2.0, is good news for the
Earth Science community since it offers new possibilities for sharing and harvesting
community-provided content as well as collaboratively creating new things.
One key feature of all of these new softwares is the end-user's ability to add tags, adding
value by extending the metadata of the particular object. Ad hoc tagging (folksonomy)
gives a rich description of the internet resources, but it has the disadvantage of
providing a fuzzy schema. The semantic uniformity of the internet resources can be
improved by controlled tagging which apply a consistent namespace and tag
combinations to diverse objects.
We have used the above tagging approaches in order to gather internet resources
pertaining to air quality events. Initial event analysis of the southern Georgia fires,
which burned in April and May, 2007, began with filtering and harvesting usercontributed web content. The Google Blog Search of 'Florida smoke' returned several
thousand entries, many of them unrelated to the wildfires. Visually scanning the blog
entries yielded a number of interesting posts, which were given the controlled tags
'070508+Florida+Smoke' in the social bookmarking tool del.icio.us. Additional smoke
photos were found in the photo-sharing service, Flickr and given the same set of
controlled tags. Together, these tools yielded a rich but only qualitative description of
the Georgia Fires. Because of the common set of controlled tags these web objects
(i.e. links and photos) were harvested in a wiki environment, which also contained the
links to quantitative air quality analysis based on satellite and surface observations.
Goal:
•
•
•
Cross-leverage the shared resources on the Web, while maintaining autonomy of
different services.
Better apply decision-support material in research, regulation and policy
Amplify and connect minds
Approach:
• Harvest and aggregate Web content
• Use collaborative wiki workspaces
• Create knowledge products through communal and individual analysis
User-Generated Content
• Web 2.0 software allows users to easily add objects to
the web
–
–
–
–
–
: Links
: Photos
: Video
: Presentations
: Blogs/Wikis
• Structured metadata is already encoded on these types
of data (date, user, type)
• All objects have URL
Wiki
•
•
Wikis originally used just for
collaborative writing
Features:
–
–
–
–
•
•
•
Editable by web users
Tags
Discussion pages
Versioning
View
Edit
Now they are dynamic workspaces,
able to embed web objects from
disparate sources
Add additional context, facilitate
collaborative analysis
Allow two-way transfer of
knowledge
Discuss
Collaborate
Tags
• Keywords added to web objects either by provider or user
Pro:
• Tags can be added by anyone, to any URL
• Allow for multiple types of categorization, not just one hierarchy
• Can tag in any service
Con:
• Uncontrolled number of tags
• Multiple words with same meaning
• Can tag in any service
Controlled Tag-based Mediation
•
Users can be mediators of web-based content by “wrapping” it with a unique
controlled tag (or set of tags) in two ways:
– Use Del.icio.us to homogenize the heterogeneous objects
– Create wiki page as the web object. Add semantic tags.
•
Create wiki page which harvests queries and adds context to create
emergent, reusable knowledge
Controlled
Tag-based
Connectivity
Communal Event Analysis
Southern California Fire Smoke
Given the high density and short response of
user-generated content about air pollution
events it is said that the Earth, has now
acquired a "skin" for the detection of
changes in the environment.
• Control Tag: 071022SoCalSmoke
•
•
Quantitative:
– Harvest links and relevant datasets
– Controlled tagging in the wiki
(datasets) and in Del.icio.us (links)
– Query/RSS from Del.icio.us and wiki
into EventSpace wiki page
Qualitative (Blogs, Flickr, YouTube):
– Use service to perform coarse
filtering
– Controlled tagging in del.icio.us
– RSS feed from del.icio.us into the
EventSpace
Datasets
Links
Data System Profiles
Multiple Wiki Views
Data System
Wrap Data System
Metadata
Semantic Tags
• Needed consistent description of multiple, autonomous
data systems
• All of the data systems were web-based, however the
metadata about them was distributed.
• Used semantic tagging in the ESIP wiki to wrap
distributed, heterogeneous data system metadata into a
homogenous view for easy comparison of systems.
• Semantic tags are sets of tags with a specific type and
attribute
– Type determines kind of response that can be given (text,
enumeration, date, location)
– Attribute is the semantic tag name
• Queried semantic tags returns filtered list
Community Data Sharing ‘DataSpaces’
Multiple Views
Catalog –
Find Dataset
Dataset
Wrap metadata with Semantic Tags
DataSpaces
Reuse Meta
•
Two parts:
– Semantic Tags: Structured
– User-added content: Unstructured
• Semantic Tags:
– Define common features of all datasets (BBox, time range, provider)
– Can be queried within the wiki to show a subset of datasets.
– Ready for Export/Harvesting with RDF Feeds for use by Registries,
Catalogs, XSLT transformations
• User-added Content:
–
–
–
–
Feedback/FAQ’s from users about datasets
Tagged, relevant papers about dataset
Dataset lineage
…
Summary
• By adding unique tags, groups can collaboratively curate
lists of resources
• The wiki allows the integration of seemingly unrelated
information from distributed web objects to be brought
together by harvesting unique tags.
• Tagging within the wiki allows emergent structure to
evolve.
Future Work
• Continue to learn how to add structure with tagging
• Continue to mash structured tagging with the wiki
‘canvas’
• Use tagging as a way to allow feedback from user to
provider.
• Facilitate community tagging and collaboration