Social Semantic Web - UMBC ebiquity research group

Download Report

Transcript Social Semantic Web - UMBC ebiquity research group

Adding Semantics to
Social Websites for
Citizen Science
Pranam Kolari
University of Maryland,
Baltimore County
Joint work with Andriy Parafiynyk, Tim Finin,
Cynthia Parr, Joel Sachs, and Lushan Han
http://ebiquity.umbc.edu/paper/html/id/365
UMBC
an Honors University in Maryland
 http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by
DARPA contract F30602-97-1-0215, NSF grants CCR007080 and IIS9875433
1
This talk
• Motivation
• Swoogle Semantic Web
search engine
• Social Semantic Web
• Conclusions
2
SOCIAL MEDIA
Social media describes the online
technologies and practices that people use
to share opinions, insights, experiences,
and perspectives and engage with each
other.
Wikipedia 07
3
Social Media for agents
• Today social media supports information sharing
among communities of people - enables Citizen
Journalism
• An infrastructure based on pings, feeds, content
aggregators, and filters (e.g. pipes) aids scalability
• Social media now accounts for ~1/3 of new Web
content!
• We need to explore how networks of agents can
use the same strategies to share data and
knowledge
4
This talk
• Motivation
• Swoogle Semantic Web
search engine
• Social Semantic Web
• Conclusions
5
Google has made us smarter
6
But what about our agents?
tell
register
Agents still have a very minimal
understanding of text and images.
7
But what about our agents?
Swoogle
Swoogle
Swoogle
Swoogle
tell
Swoogle
Swoogle
Swoogle
register
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
A Google for knowledge on the Semantic Web
is needed by software agents and programs
8
• http://swoogle.umbc.edu/
• Running since summer 2004
• 2.2M RDF docs, 434M triples, 10K ontologies,
15K namespaces, 1.5M classes, 185K properties,
49M instances, 800 registered users
9
Swoogle Architecture
Analysis
SWD classifier
…
Ranking
Index
Search Services
IR Indexer
SWD Indexer
Semantic Web
metadata
Web
Server
Web
Service
html
Discovery
document cache
Archive
Candidate
URLs
the Web
SwoogleBot
Bounded Web Crawler
Google Crawler
rdf/xml
Semantic Web
human
machine
pings
Information flow
Swoogle‘s web interface
10
Applications and use cases
1 Supporting Semantic Web developers
– Ontology designers, vocabulary discovery, who’s using my
ontologies or data?, use analysis, errors, statistics, etc.
2 Searching specialized collections
– Spire: aggregating observations and data from biologists
– InferenceWeb: searching over and enhancing proofs
– SemNews: Text Meaning of news stories
3 Supporting SW tools
– Triple shop: finding data for SPARQL queries
11
2
An NSF ITR collaborative project with
• University of Maryland, Baltimore County
• University of Maryland, College Park
• U. Of California, Davis
• Rocky Mountain Biological Laboratory
12
An invasive species scenario
• Nile Tilapia fish have been found in a California lake.
• Can this invasive species thrive in this environment?
• If so, what will be the likely
consequences for the
ecology?
• So…we need to understand
the effects of introducing
this fish into the food web
of a typical California lake
13
Food Webs
• A food web models the trophic (feeding)
relationships between organisms in an ecology
– Food web simulators explore consequences of ecological
changes, i.e., species introduction or removal
– Food web are constructed from studies of a location’s
species inventory and the known trophic relations.
• Goal: automatically construct a food web for a new
species using existing data and knowledge
• ELVIS: Ecosystem Location Visualization and
Information System
14
East River Valley Trophic Web
http://www.foodwebs.org/
15
The problem
• We have data on what species are known to be in
the location and can further restrict and fill in with
other ecological models
=> Maybe we can mine social media for species
observations data?
• But we don’t know which of these the Nile Tilapia
eats of who might eat it.
• We can reason from taxonomic data (similar
species) and known natural history data (size,
mass, habitat, etc.) to fill in the gaps.
16
Food Web Constructor
Predict food web links using database and taxonomic reasoning.
In an new estuary, Nile
Tilapia could compete
with ostracods (green)
to eat algae. Predators
(red) and prey (blue) of
ostracods may be
affected
17
Status
• ELVIS (Ecosystem Location Visualization and
Information System) as an integrated set of web
services for constructing food webs for a given
location.
• Background ontologies
– SpireEcoConcepts: concepts and properties to
represent food webs, and ELVIS related tasks, inputs
and outputs
– ETHAN (Evolutionary Trees and Natural History)
Concepts and properties for ‘natural history’
information on species derived from data in the Animal
diversity web and other taxonomic sources. 250K
classes on plants and animals
18
This talk
• Motivation
• Swoogle Semantic Web
search engine
• Social Semantic Web
• Conclusions
19
• Social media sites have become the
biggest source of new content on the Web
• Blogs, Wikis, Photo sites, forums, etc.
• Accounting for ~1/3 of new Web content
20
• Social media sites embrace new ways of
letting users add semantic information
• Shows users the potential of semantics
• This graph shows the uptake of tags in blogs
21
Social Media and the Semantic Web
• Many are exploring how Semantic Web technology
can work with social media
• Social media like blogs are typically temporally
organized
– valued for their timely and dynamic information!
• If static pages form the Web’s long term memory,
then the Blogosphere is its stream of consciousness
• Maybe we can (1) help people publish data in RDF
on their blogs, (2) mine social media sites for
useful information, (3) exploit new infrastructure
ideas for sharing Semantic Web data.
22
A BioBlitz involves going
out to an area and
recording every organism
you see
The OWL icon
links to the data
in RDF
23
Here’s the post’s
RDF data
24
A good Semantic Web opportunity
• We want to make it easy for scientists to enter
and collect information from social media
–Professionals, students and amateurs!
• Some early examples
–SPOTter – a tool to add Semantic Web data
to blogs
–Splickr – a system to mine Flickr for images
of organisms
–RDF123 – an application and Web service to
render spreadsheets as RDF data
25
SPOTter: SPire Observation Tool
• We’ve developed some simple components to help
people add RDF data to blogs and ping Swoogle to get
it indexed.
• SPOTter is an initial prototype that uses the ETHAN
ontology and is being used in some BioBlitz activities
with students.
• We’re working toward a version that uses Twitter so
that people can make the blog entries from the cell
phones via SMS
– The SPOTter agent will get the entries (via RSS)
and index the data
26
SPOTter
button
Once entered, the data is
embedded into the blog post
and Swoogle is pinged to
index it
27
Prototype
SPOTter
Search
engine
• We can draw a bounding box on
the map and find observations
• An RSS feed provided for each
query
28
Flickr
• The Flickr “photo sharing” site has millions of
photographs
– Many of plants and animals
• Most of them have descriptions, timestamps, tags and
even geo-tags
– Flickr has even introduced “machine tags” that can
be mapped into RDF
• Any Flickr users (humans or bots) can add comments
and annotations
• There’s a good API
• It could be a good source of ecological information
29
30
31
Results for people and machines
32
RDF123
An application and web service to generate RDF
data from spreadsheets
Graphically create
& edit spreadsheet
to RDF map
map + spreadsheet
=> RDF data
MAP
MAP
CSV or
Google
doc
DATA
Some metadata can
Be embedded in
spreadsheet
See http://ebiquity.umbc.edu/project/html/id/82/
33
RDF123
• The Bioblitz project needed a way to
collect and share observational data
from students
• Spreadsheets selected as a common data format and
templates developed
• RDF123 application and web service developed to
ease exporting the data as RDF for a Maryland
BioBlitz group
– Supports a web service to generate RDF given
URLs for the sheet and map
– Works on CSV files and also Google spreadsheets
34
A map provides a
template for an
RDF subgraph for
each row
UMBC
an Honors University in Maryland
35
The map is also
represented in RDF
36
Here’s the RDF that’s
produced from the
spreadsheet
37
Metadata, including
the URI of a map,
can be embedded in
the spreadsheet
38
Ping and Feed Design Pattern
• The Web uses a ping and feed design pattern
that is a variant of publish and subscribe
• It accounts for the scalable, smooth function
of the Blogosphere and related social media
systems
• Pings push and feeds pull
• We can use the same approach to managing
volumes of Semantic Web data
39
Pings and Feeds in the Blogosphere
• Content provider send pings to ping servers when
they have a new item
• Ping servers aggregate pings and stream them to
aggregators and indexers, like Google
• Indexing sites retrieve new items from content
provider’s feed
C1
C2
pings
Ping
Server
Search
Engine
C3
40
Pings and Feeds in the Semantic Web
• Content provider send pings to ping-thesemantic-web when they have new RDF data
• PTSW aggregates pings and streams them to SW
aggregators and indexers, like Swoogle
• Indexing sites retrieve new RDF data from content
provider’s feed
C1
C2
pings
PTSW
Swoogle
C3
41
Semantic Web Feeds drive Mashups
• As in the regular web, sites and query engines use
feeds to capture queries
• Accessing a feed runs the query and produces a list
of the first N results (usually 10 ≤ N ≤ 20)
• Such query feeds can drive mashups
• Systems like Yahoo pipes make it easy to compose
feeds
42
This talk
• Motivation
• Swoogle Semantic Web
search engine
• Social Semantic Web
• Conclusions
43
Conclusion
• The web will contain the world’s knowledge in
forms accessible to people and computers
– We need better ways to discover, index, search and
reason over SW knowledge
• SW search engines address different tasks than
html search engines
– So they require different techniques and APIs
• Swoogle like systems can help create consensus
ontologies and foster best practices
• Social media provide new challenges and
opportunities for the Semantic Web
44
For more information
http://ebiquity.umbc.edu/
Annotated
in OWL
45