Transcript Document

Searching for
Knowledge and Data
on the Semantic Web
Tim Finin
University of Maryland, Baltimore County
http://ebiquity.umbc.edu/resource/html/id/179/
Joint work with Li Ding, Anupam Joshi, Yun Peng, Cynthia Parr, Pranam Kolari, Pavan
Reddivari, Sandor Dornbush, Rong Pan, Akshay Java, Joel Sachs, Scott Cost and Vishal Doshi
 http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by DARPA contract F3060297-1-0215, NSF grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.
UMBC
an Honors University in Maryland
1
Google has made us smarter
UMBC
an Honors University in Maryland
3
But what about our agents?
tell
register
UMBC
an Honors University in Maryland
Agents still have a very minimal
understanding of text and images.
4
XML helps
“XML is Lisp's bastard nephew, with uglier syntax
and no semantics. Yet XML is poised to enable the
creation of a Web of data that dwarfs anything
since the Library at Alexandria.”
-- Philip Wadler, Et tu XML? The fall of
the relational empire, VLDB, Rome,
September 2001.
UMBC
an Honors University in Maryland
6
Semantic Web adds semantics
“The Semantic Web will globalize
KR*, just as the WWW globalize
hypertext”
-- Tim Berners-Lee
* Knowledge Representation
UMBC
an Honors University in Maryland
7
But what about our agents?
Swoogle
Swoogle
Swoogle
Swoogle
tell
Swoogle
Swoogle
Swoogle
register
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
Swoogle
A Google for knowledge on the Semantic Web
is needed by software agents and programs
UMBC
an Honors University in Maryland
13
• http://swoogle.umbc.edu/
• Running since summer 2004
• 1.5M RDF documents, 300M RDF triples, 10K ontologies
UMBC
an Honors University in Maryland
15
Swoogle Architecture
Analysis
SWD classifier
Ranking
Index
…
Search Services
IR Indexer
SWD Indexer
Semantic Web
metadata
Web
Server
Web
Service
html
Discovery
document cache
Candidate
URLs
SwoogleBot
Bounded Web Crawler
Google Crawler
rdf/xml
the Web
Semantic Web
human
machine
Legends
UMBC
an Honors University in Maryland
Information flow
Swoogle‘s web interface
16
Applications and use cases
• Supporting Semantic Web developers
– Ontology designers, vocabulary discovery, who’s using
my ontologies or data?, use analysis, errors,statistics, etc.
• Searching specialized collections
– Spire: aggregating observations and data from biologists
– InferenceWeb: searching over and enhancing proofs
– SemNews: Text Meaning of news stories
• Supporting SW tools
– Triple shop: finding data for SPARQL queries
UMBC
an Honors University in Maryland
21
UMBC
an Honors University in Maryland
22
80 ontologies were found that
had these three terms
By default, ontologies are ordered
by their ‘popularity’, but they can
also be ordered by recency or size.
Let’s look at this one
UMBC
an Honors University in Maryland
23
Basic Metadata
hasDateDiscovered: 2005-01-17
hasDatePing: 2006-03-21
hasPingState: PingModified
type: SemanticWebDocument
isEmbedded: false
hasGrammar: RDFXML
hasParseState: ParseSuccess
hasDateLastmodified: 2005-04-29
hasDateCache: 2006-03-21
hasEncoding: ISO-8859-1
hasLength: 18K
hasCntTriple: 311.00
hasOntoRatio: 0.98
hasCntSwt: 94.00
hasCntSwtDef: 72.00
hasCntInstance: 8.00
UMBC
an Honors University in Maryland
24
UMBC
an Honors University in Maryland
25
UMBC
an Honors University in Maryland
26
These are the namespaces this
ontology uses. Clicking on one
shows all of the documents using
the namespace.
All of this is available
in RDF form for the
agents among us.
UMBC
an Honors University in Maryland
27
Here’s what the agent sees.
Note the swoogle and wob
(web of belief) ontologies.
UMBC
an Honors University in Maryland
28
We can also search for
terms (classes, properties)
like terms for “person”.
UMBC
an Honors University in Maryland
29
10K terms associated with
“person”! Ordered by use.
Let’s look at foaf:Person’s metadata
UMBC
an Honors University in Maryland
30
UMBC
an Honors University in Maryland
31
UMBC
an Honors University in Maryland
32
UMBC
an Honors University in Maryland
33
UMBC
an Honors University in Maryland
34
UMBC
an Honors University in Maryland
35
UMBC
an Honors University in Maryland
36
UMBC Triple Shop
• http://sparql.cs.umbc.edu/
• Online SPARQL RDF query processing based
on HP’s Jena and Joseki with several interesting
features
• Selectable level of inference over model
• Automatically finds SWDs for give queries using
Swoogle backend database
– Provide dataset creation wizard
– Dataset can be stored on our server or downloaded
– Tag, share and search over saved datasets
UMBC
an Honors University in Maryland
38
Who knows Anupam Joshi?
Show me their names, email address
and pictures
UMBC
an Honors University in Maryland
40
The UMBC ebiquity
site publishes lots of
RDF data, including
FOAF profiles
UMBC
an Honors University in Maryland
41
No FROM clause!
Constraints on where
the data comes from
UMBC
an Honors University in Maryland
42
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT DISTINCT ?p2name ?p2mbox ?p2pix
WHERE {
?p1 foaf:name "Anupam Joshi" .
?p1 foaf:mbox ?p1mbox .
?p2 foaf:knows ?p3 .
?p3 foaf:mbox ?p1mbox .
?p2 foaf:name ?p2name .
?p2 foaf:mbox ?p2mbox .
OPTIONAL { ?p2 foaf:depiction ?p2pix } .
}
ORDER BY ?p2name
UMBC
an Honors University in Maryland
43
UMBC
an Honors University in Maryland
44
Swoogle found 292
RDF data files that
appear relevant to
answering our query
UMBC
an Honors University in Maryland
45
Let’s save the dataset
before we use it
UMBC
an Honors University in Maryland
46
UMBC
an Honors University in Maryland
47
And tag it so we and
others can find it more
easily.
UMBC
an Honors University in Maryland
48
Here we are using it to
get an answer to “Who
knows Anupam Joshi”
UMBC
an Honors University in Maryland
49
He has many friends!
UMBC
an Honors University in Maryland
50
Conclusion
• The web will contain the world’s knowledge in
forms accessible to people and computers
– We need better ways to discover, index, search and
reason over SW knowledge
• SW search engines address different tasks than
html search engines
– So they require different techniques and APIs
• Swoogle like systems can help create consensus
ontologies and foster best practices
– Swoogle is for Semantic Web 1.0
– Semantic Web 2.0 will make different demands
UMBC
an Honors University in Maryland
56
For more information
http://ebiquity.umbc.edu/
Annotated
in OWL
UMBC
an Honors University in Maryland
57