Transcript 317

Information Integration
and the Semantic Web
Finding knowledge, data and answers
Tim Finin1, Anupam Joshi1, Li Ding2
1 University
of Maryland, Baltimore County
2 Stanford University, Knowledge Systems Lab
Joint work with Yun Peng, Cynthia Parr, Andriy Parafinyk, Lushan Han,
Pranam Kolari, Pavan Reddivari, Rong Pan, Akshay Java, Joel Sachs and others.
UMBC
an Honors University in Maryland
http://ebiquity.umbc.edu/resource/html/id/327/
 http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by DARPA contract
F30602-97-1-0215, NSF grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.
1
Google has made us smarter
UMBC
an Honors University in Maryland
2
But what about our agents?
tell
register
UMBC
an Honors University in Maryland
Agents still have a very minimal
understanding of text and images.
3
But what about our agents?
Swoogle
Swoogle
Swoogle
Swoogle
tell
Swoogle
Swoogle
Swoogle
register
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
A Google for knowledge on the Semantic Web
is needed by software agents and programs
UMBC
an Honors University in Maryland
4
Information Integration
and the Semantic Web
• The Semantic Web enables information integration
with standards supporting shared semantic models,
ontology mapping, common tools, etc.
• A Google-like global index can help people and
programs to
– Find Semantic Web ontologies and data
– Understand how these are being used
– Build trust and provenance models
– Assemble ontology maps
– Create new integration tools
UMBC
an Honors University in Maryland
5
• http://swoogle.umbc.edu/
• Running since summer 2004
• 1.8M RDF docs, 320M triples, 10K ontologies,
15K namespaces, 1.3M classes, 175K properties,
43M instances, 600 registered users
UMBC
an Honors University in Maryland
6
Applications and use cases
1 Supporting Semantic Web developers
– Ontology designers, vocabulary discovery, who uses what
ontologies & data, use analysis, errors, statistics, etc.
2 Helping scientists publish and find data
– Spire: aggregating observations and data from biologists
– InferenceWeb: searching over and enhancing proofs
– SemNews: Text Meaning of news stories
3 Supporting SW tools
– Triple shop: finding data for SPARQL queries
UMBC
an Honors University in Maryland
7
1
UMBC
an Honors University in Maryland
8
80 ontologies were found that
had these three terms
By default, ontologies are ordered
by their ‘popularity’, but they can
also be ordered by recency or size.
Let’s look at this one
UMBC
an Honors University in Maryland
9
All of this is available
in RDF form for the
agents among us.
UMBC
an Honors University in Maryland
10
Here’s what the agent sees.
Note the swoogle and wob
(web of belief) ontologies.
UMBC
an Honors University in Maryland
11
2
An NSF ITR collaborative project with
• University of Maryland, Baltimore County
• University of Maryland, College Park
• University of California, Davis
• Rocky Mountain Biological Laboratory
UMBC
an Honors University in Maryland
12
Invasive Species
• Invasive species cost the U.S.
economy over $138 billion per year
• By various estimates, these species
contribute to the decline of 35% - 46% of U.S.
endangered and threatened species
• The invasive species problem is growing, as the
number of pathways of invasion increases.
Pimental et al. 2000 Environmental and economic costs associated with non-indigenous species in the United States.
Bioscience 50:53-65.
Charles Groat, Director U.S. Geological Survey, http://www.usgs.gov/invasive_species/plw/usgsdirector01.html
UMBC
an Honors University in Maryland
13
East River Valley Trophic Web
UMBC
http://www.foodwebs.org/
an Honors University in Maryland
14
Biologists Gathering data
UMBC
an Honors University in Maryland
• Increase utility
• Maximize productivity
• Foster discovery
• Broaden participation
15
Representing and sharing data
Journal articles
Flat files
Spreadsheets
Local databases
On the Web in HTML or XML
UMBC
an Honors University in Maryland
16
ELVIS: Ecosystem Localization,
Visualization, and Integration System
Oreochromis niloticus
Nile tilapia
Bacteria
Microprotozoa
Food web
Species list Amphithoe longimana
constructor
constructor Caprella penantis
Cymadusa compta
Lembos rectangularis
Batea catharinensis
Ostracoda
Melanitta
Tadorna tadorna . . .
?
UMBC
an Honors University in Maryland
17
ELVIS Food Web Constructor
predicts basic network structure
WeightAB 
1
1  ( DistanceXA  PenaltyXA )  ( DistanceYB  PenaltyYB )
N
CertaintyIdxXY  
i 1
UMBC
an Honors University in Maryland
weighti
( LinkValuei )
discount
Prelude to systems models
18
Examine evidence for predicted links.
The Evidence
Provider lets
users explore
evidence
(data, papers,
reasoning) for
food web links
UMBC
an Honors University in Maryland
19
data from
~300 food
webs
UMBC
an Honors University in Maryland
20
Supporting ontologies and their use
• SpireEcoConcepts, for
– confirmed and potential food web links
– bibliographic information of food web studies
– ecosystem terms
– taxonomic ranks
• California Wildlife Habitat Relationships Ontology
– life history
– geographic range
– management information
• ETHAN (Evolutionary Trees and Natural History)
– Natural history information on species derived from
data in the Animal Diversity Web and other
taxonomic sources
UMBC
an Honors University in Maryland
21
UMBC Triple Shop
• http://sparql.cs.umbc.edu/
3
• Online SPARQL RDF query
processing with several interesting features
• Automatically finds data for queries using Swoogle
• Datasets, queries and results can be saved, tagged,
annotated, shared, searched for, etc.
• RDF datasets as first class objects
– Can be stored on our server or downloaded
– Can be materialized in a database or
(soon) as a Jena model
UMBC
an Honors University in Maryland
RDF query language
22
Triple Shop
What are body masses
of fishes that eat fishes?
. . . leaving out the FROM clause
UMBC
an Honors University in Maryland
23
specify dataset
UMBC
an Honors University in Maryland
24
11 RDF documents
were found that might
have useful data
UMBC
an Honors University in Maryland
25
We’ll select them all
and add them to the
current dataset.
UMBC
an Honors University in Maryland
26
We’ll run the
query against
this dataset to
see if the
results are as
expected.
UMBC
an Honors University in Maryland
27
The results can
be produced in
any of several
formats
UMBC
an Honors University in Maryland
28
Results
http://sparql.cs.umbc.edu/tripleshop2/
UMBC
an Honors University in Maryland
29
• Looks like a
useful dataset!
• Let’s annotate,
tag and save it
and also
materialize it the
TS triple store.
• Queries can also
be annotated,
tagged and
shared.
UMBC
an Honors University in Maryland
30
Themes revisited
• The Web contains the world’s knowledge in
forms accessible to people and computers
• The Semantic Web enables information
integration with standards supporting shared
semantic models, ontology mapping, common
tools, etc.
• We need better ways to discover, index, search
and reason over knowledge on the Semantic Web
• Swoogle-like systems help create consensus
ontologies, foster best practices, find data and
support tools.
UMBC
an Honors University in Maryland
31
For more information
http://ebiquity.umbc.edu/
Annotated
in OWL
UMBC
an Honors University in Maryland
32