Transcript PowerPoint
Finding knowledge,
data and answers on
the Semantic Web
Tim Finin
University of Maryland, Baltimore County
http://ebiquity.umbc.edu/resource/html/id/202/
Joint work with Li Ding, Anupam Joshi, Yun Peng, Cynthia Parr, Pranam Kolari, Pavan
Reddivari, Sandor Dornbush, Rong Pan, Akshay Java, Joel Sachs, Scott Cost and Vishal Doshi
http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by DARPA contract F3060297-1-0215, NSF grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.
UMBC
an Honors University in Maryland
1
This talk
• Motivation
• Swoogle Semantic Web
search engine
• Use cases and applications
• Observations
• Conclusions
UMBC
an Honors University in Maryland
2
Google has made us smarter
UMBC
an Honors University in Maryland
3
But what about our agents?
tell
register
UMBC
an Honors University in Maryland
Agents still have a very minimal
understanding of text and images.
4
But what about our agents?
Swoogle
Swoogle
Swoogle
Swoogle
tell
Swoogle
Swoogle
Swoogle
register
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
A Google for knowledge on the Semantic Web
is needed by software agents and programs
UMBC
an Honors University in Maryland
5
This talk
• Motivation
• Swoogle Semantic Web
search engine
• Use cases and applications
• Observations
• Conclusions
UMBC
an Honors University in Maryland
6
• http://swoogle.umbc.edu/
• Running since summer 2004
• 1.8M RDF docs, 320M triples, 10K ontologies,
15K namespaces, 1.3M classes, 175K properties,
43M instances, 600 registered users
UMBC
an Honors University in Maryland
7
Swoogle Architecture
Analysis
SWD classifier
Ranking
Index
…
Search Services
IR Indexer
SWD Indexer
Semantic Web
metadata
Web
Server
Web
Service
html
Discovery
document cache
Candidate
URLs
SwoogleBot
Bounded Web Crawler
Google Crawler
rdf/xml
the Web
Semantic Web
human
machine
Legends
UMBC
an Honors University in Maryland
Information flow
Swoogle‘s web interface
8
This talk
• Motivation
• Swoogle Semantic Web
search engine
• Use cases and applications
• Observations
• Conclusions
UMBC
an Honors University in Maryland
12
Applications and use cases
1 Supporting Semantic Web developers
– Ontology designers, vocabulary discovery, who’s using my
ontologies or data?, use analysis, errors, statistics, etc.
2 Searching specialized collections
– Spire: aggregating observations and data from biologists
– InferenceWeb: searching over and enhancing proofs
– SemNews: Text Meaning of news stories
3 Supporting SW tools
– Triple shop: finding data for SPARQL queries
UMBC
an Honors University in Maryland
13
1
UMBC
an Honors University in Maryland
14
80 ontologies were found that
had these three terms
By default, ontologies are ordered
by their ‘popularity’, but they can
also be ordered by recency or size.
Let’s look at this one
UMBC
an Honors University in Maryland
15
Basic Metadata
hasDateDiscovered: 2005-01-17
hasDatePing: 2006-03-21
hasPingState: PingModified
type: SemanticWebDocument
isEmbedded: false
hasGrammar: RDFXML
hasParseState: ParseSuccess
hasDateLastmodified: 2005-04-29
hasDateCache: 2006-03-21
hasEncoding: ISO-8859-1
hasLength: 18K
hasCntTriple: 311.00
hasOntoRatio: 0.98
hasCntSwt: 94.00
hasCntSwtDef: 72.00
hasCntInstance: 8.00
UMBC
an Honors University in Maryland
16
UMBC
an Honors University in Maryland
17
rdfs:range
was used 41 times
owl:ObjectProperty
was
time:Cal…
defined
once
and
to
assert
a
value.
instantiated
28 times
used
24 times (e.g.,
as range)
UMBC
an Honors University in Maryland
18
These are the namespaces this
ontology uses. Clicking on one
shows all of the documents using
the namespace.
All of this is available
in RDF form for the
agents among us.
UMBC
an Honors University in Maryland
19
Here’s what the agent sees.
Note the swoogle and wob
(web of belief) ontologies.
UMBC
an Honors University in Maryland
20
We can also search for
terms (classes, properties)
like terms for “person”.
UMBC
an Honors University in Maryland
21
10K terms associated with
“person”! Ordered by use.
Let’s look at foaf:Person’s metadata
UMBC
an Honors University in Maryland
22
UMBC
an Honors University in Maryland
23
UMBC
an Honors University in Maryland
24
UMBC
an Honors University in Maryland
25
87K documents used foaf:gender with a
foaf:Person instance as the subject
UMBC
an Honors University in Maryland
26
3K documents used dc:creator with a
foaf:Person instance as the object
UMBC
an Honors University in Maryland
27
Swoogle’s archive saves every
version of a SWD it’s seen.
UMBC
an Honors University in Maryland
28
UMBC
an Honors University in Maryland
29
2
An NSF ITR collaborative project with
• University of Maryland, Baltimore County
• University of Maryland, College Park
• U. Of California, Davis
• Rocky Mountain Biological Laboratory
UMBC
an Honors University in Maryland
30
An invasive species scenario
• Nile Tilapia fish have been found in a California lake.
• Can this invasive species thrive in this environment?
• If so, what will be the likely
consequences for the
ecology?
• So…we need to understand
the effects of introducing
this fish into the food web
of a typical California lake
UMBC
an Honors University in Maryland
31
Food Webs
• A food web models the trophic (feeding)
relationships between organisms in an ecology
– Food web simulators are used to explore the
consequences of changes in the ecology, such as the
introduction or removal of a species
– A locations food web is usually constructed from studies
of the frequencies of the species found there and the
known trophic relations among them.
• Goal: automatically construct a food web for a new
location using existing data and knowledge
• ELVIS: Ecosystem Location Visualization and
Information System
UMBC
an Honors University in Maryland
32
East River Valley Trophic Web
UMBC
http://www.foodwebs.org/
an Honors University in Maryland
33
Species List Constructor
Click a county, get a species list
UMBC
an Honors University in Maryland
34
The problem
• We have data on what species are known to be in
the location and can further restrict and fill in with
other ecological models
• But we don’t know which of these the Nile Tilapia
eats of who might eat it.
• We can reason from taxonomic data (simlar
species) and known natural history data (size,
mass, habitat, etc.) to fill in the gaps.
UMBC
an Honors University in Maryland
35
UMBC
an Honors University in Maryland
36
Food Web Constructor
Predict food web links using database and taxonomic reasoning.
UMBC
an Honors University in Maryland
In an new estuary, Nile
Tilapia could compete
with ostracods (green)
to eat algae. Predators
(red) and prey (blue) of
ostracods may be
affected
37
Evidence Provider
Examine evidence for predicted links.
UMBC
an Honors University in Maryland
38
Status
• Goal is ELVIS (Ecosystem Location Visualization and
Information System) as an integrated set of web services
for constructing food webs for a given location.
• Background ontologies
– SpireEcoConcepts: concepts and properties to represent food
webs, and ELVIS related tasks, inputs and outputs
– ETHAN (Evolutionary Trees and Natural History) Concepts and
properties for ‘natural history’ information on species derived
from data in the Animal diversity web and other taxonomic
sources
• Under development
– Connect to visualization software
– Connect to triple shop to discover more data
UMBC
an Honors University in Maryland
39
UMBC Triple Shop
• http://sparql.cs.umbc.edu/
3
• Online SPARQL RDF query processing with several
interesting features
• Automatically finds SWDs for give queries using
Swoogle backend database
• Datasets, queries and results can be saved, tagged,
annotated, shared, searched for, etc.
• RDF datasets as first class objects
– Can be stored on our server or downloaded
– Can be materialized in a database or
(soon) as a Jena model
UMBC
an Honors University in Maryland
40
Web-scale semantic web data access
agent
data access service
ask (“person”)
Search vocabulary
Compose query
Populate
RDF database
inform (“foaf:Person”)
the Web
Index RDF data
Search URIrefs
in SW vocabulary
ask (“?x rdf:type foaf:Person”)
inform (doc URLs)
Search URLs
in SWD index
Fetch docs
Query local
RDF database
UMBC
an Honors University in Maryland
41
Who knows Anupam Joshi?
Show me their names, email address
and pictures
UMBC
an Honors University in Maryland
42
The UMBC ebiquity
site publishes lots of
RDF data, including
FOAF profiles
UMBC
an Honors University in Maryland
43
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?p2name ?p2mbox ?p2pix
FROM ???
WHERE { ?p1 foaf:surname "Joshi" .
FROM clause!
?p1 foaf:firstName No
“Anupam"
.
?p1 foaf:mbox ?p1mbox .
?p2 foaf:knows ?p3 .
?p3 foaf:mbox ?p1mbox .
?p2 foaf:name ?p2name .
?p2 foaf:mbox ?p2mbox .
OPTIONAL { ?p2 foaf:depiction ?p2pix } .
}
ORDER BY ?p2name
UMBC
an Honors University in Maryland
44
log in
specify dataset
Enter query w/o
FROM clause!
UMBC
an Honors University in Maryland
45
UMBC
an Honors University in Maryland
46
UMBC
an Honors University in Maryland
47
302 RDF documents
were found that
might have useful
data.
UMBC
an Honors University in Maryland
48
We’ll select them all
and add them to
the current dataset.
UMBC
an Honors University in Maryland
49
We’ll run the query
against this dataset
to see if the results
are as expected.
UMBC
an Honors University in Maryland
50
The results can be
produced in any of
several formats
UMBC
an Honors University in Maryland
51
UMBC
an Honors University in Maryland
52
Looks like a useful
dataset. Let’s save it
and also materialize
it the TS triple store.
UMBC
an Honors University in Maryland
53
UMBC
an Honors University in Maryland
54
We can also
annotate, save and
share queries.
UMBC
an Honors University in Maryland
55
Work in Progress
• There are a host of performance issues
• We plan on supporting some special datasets, e.g.,
– FOAF data collected from Swoogle
– Definitions of RDF and OWL classes and properties from all
ontologies that Swoogle has discovered
• Expanding constraints to select candidate SWDs to include
arbitrary metadata and embedded queries
– FROM “documents trusted by a member of the SPIRE
project”
• We will explore two models for making this useful
– As a downloadable application for client machines
– As an (open source?) downloadable service for servers
supporting a community of users.
UMBC
an Honors University in Maryland
56
This talk
• Motivation
• Swoogle Semantic Web
search engine
• Use cases and applications
• Observations
• Conclusions
UMBC
an Honors University in Maryland
57
Will Swoogle Scale? How?
Here’s a rough estimate of the data in RDF documents on the
semantic web based on Swoogle’s crawling
System/date
Terms
Documents Individuals
Triples
Bytes
Swoogle2
1.5x105
3.5x105
7x106
5x107
7x109
Swoogle3
2x105
7x105
1.5x107
7.5x107
1x1010
2006
1x106
5x107
5x107
5x109
5x1011
2008
5x106
5x109
5x109
5x1011
5x1013
We think Swoogle’s centralized approach can be made to work
for the next few years if not longer.
UMBC
an Honors University in Maryland
58
How much reasoning should Swoogle do?
• SwoogleN (N<=3) does limited reasoning
– It’s expensive
– It’s not clear how much should be done
• More reasoning would benefit many use cases
– e.g., type hierarchy
• Recognizing specialized metadata
– E.g., that ontology A some maps terms from B to C
UMBC
an Honors University in Maryland
59
A RDF Dictionary
• We hope to develop an RDF dictionary.
• Given an RDF term, returns a graph of its
definiton
– Term definition from “official” ontology
– Term+URL definition from SWD at URL
– Term+* union definition
– Optional argument recursively adds definitions of terms
in definition excluding RDFS and OWL terms
– Optional arguments identifies more namespaces to
exclude
UMBC
an Honors University in Maryland
60
This talk
• Motivation
• Swoogle Semantic Web
search engine
• Use cases and applications
• Observations
• Conclusions
UMBC
an Honors University in Maryland
61
Conclusion
• The web will contain the world’s knowledge in
forms accessible to people and computers
– We need better ways to discover, index, search and
reason over SW knowledge
• SW search engines address different tasks than
html search engines
– So they require different techniques and APIs
• Swoogle like systems can help create consensus
ontologies and foster best practices
– Swoogle is for Semantic Web 1.0
– Semantic Web 2.0 will make different demands
UMBC
an Honors University in Maryland
62
For more information
http://ebiquity.umbc.edu/
Annotated
in OWL
UMBC
an Honors University in Maryland
63