Transcript rdf:type

Finding and Ranking
Knowledge on the
Semantic Web
Li Ding, Rong Pan, Tim Finin, Anupam Joshi,
Yun Peng and Pranam Kolari
University of Maryland,
Baltimore County
UMBC
an Honors University in Maryland
 http://creativecommons.org/licenses/by-nc-sa/2.0/
This work was partially supported by DARPA contract F30602-97-1-0215, NSF
grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.
1
This talk
•
•
•
•
•
•
UMBC
an Honors University in Maryland
Motivation
Swoogle overview
Bots navigate the Semantic Web
Ranking Semantic Web content
Use cases and applications
Conclusions
2
Google has made us smarter
UMBC
an Honors University in Maryland
3
But what about our agents?
tell
register
UMBC
an Honors University in Maryland
A Google for knowledge on the Semantic
Web is needed by people and software agents
4
This talk
•
•
•
•
•
•
UMBC
an Honors University in Maryland
Motivation
Swoogle overview
Bots navigate the Semantic Web
Ranking Semantic Web content
Use cases and applications
Conclusions
5
title
• text
UMBC
an Honors University in Maryland
6
Swoogle Architecture
data
analysis
metadata
creation
SWD
discovery
IR analyzer
SWD analyzer
interface
Web Server
SWD Cache
SWD Metadata
Web Service
Agent Service
SWD Reader
Candidate
URLs
The Web
Web Crawler
Swoogle 2: 340K SWDs, 48M triples, 5K SWOs, 97K classes,
55K properties, 7M individuals (4/05)
Swoogle 3: 700K SWDs, 135M triples, 7.7K SWOs, (11/05)
UMBC
an Honors University in Maryland
7
Demo
1
Find “Time” Ontology
We can use a set of keywords to search
ontology. For example, “time, before, after”
are basic concepts for a “Time” ontology.
Demo
2(a)
Digest “Time” Ontology (document view)
Demo
2(b)
Digest “Time” Ontology (term view)
TimeZone
before
………….
intAfter
Demo
3
Find Term “Person”
Not capitalized! URIref is case sensitive!
Demo
4
Digest Term “Person”
167 different properties
562 different properties
Demo
5(a)
Swoogle Today
Demo
5(b)
Swoogle
Statistics
FOAF
Trustix
W3C
Stanford
UMBC
an Honors University in Maryland
14
Swoogle’s Triple Store lets you shop
And check
out your
triples into
any of
several
reasoners
UMBC
an Honors University in Maryland
15
Summary
2004
Swoogle (Mar, 2004)
Swoogle2 (Sep, 2004)
2005
Swoogle3 (July 2005)
UMBC
an Honors University in Maryland
 Automated SWD discovery
 SWD metadata creation and search
 Ontology rank (rational surfer model)
 Swoogle watch
 Web Interface
 Ontology dictionary
 Swoogle statistics
 Web service interface (WSDL)
 Bag of URIref IR search
 Triple shopping cart
 Better (re-)crawling strategies
 Better navigation models
 Index instance data
 More metadata (ontology mapping
and OWL-S services)
 Better web service interfaces
 IR component for string literals
16
This talk
•
•
•
•
•
•
UMBC
an Honors University in Maryland
Motivation
Swoogle overview
Bots navigate the Semantic Web
Ranking Semantic Web content
Use cases and applications
Conclusions
17
The Semantic Web Onion
The “Semantic Web”
(About 10M documents)
Universal RDF Graph
Physically hosting knowledge
(About 100 triples per SWD in average)
RDF Document
triples modifying the same subject
Literal
Resource
Class-instance
Molecule
Triple
Finest lossless set of triples
Atomic knowledge block
Swoogle maintains metadata about objects in
different layers of the Semantic Web Onion.
UMBC
an Honors University in Maryland
18
Semantic Web Navigation Model
sameNamespace, sameLocalname
Extends class-property bond
Term Search
1
RDF graph
Resource
literal
2
uses
populates
SWT
3
isUsedBy
isPopulatedBy
Web
SWD
defines
officialOnto
isDefinedBy
rdfs:subClassOf
6
rdfs:seeAlso
rdfs:isDefinedBy
5
4
SWO
7
Document Search
owl:imports
…
Navigating the HTML web is simple; there’s just one kind of link.
The SW has more kinds of links and hence more navigation paths.
UMBC
an Honors University in Maryland
19
Semantic Web Navigation Model
sameNamespace, sameLocalname
Extends class-property bond
Term Search
1
RDF graph
Resource
literal
2
uses
populates
SWT
3
isUsedBy
isPopulatedBy
Web
SWD
defines
officialOnto
isDefinedBy
rdfs:subClassOf
6
rdfs:seeAlso
rdfs:isDefinedBy
5
4
SWO
7
Document Search
owl:imports
…
Relations in 1 and 3 and parts of 4 require a global view to discover
UMBC
an Honors University in Maryland
20
This talk
•
•
•
•
•
•
UMBC
an Honors University in Maryland
Motivation
Swoogle overview
Bots navigate the Semantic Web
Ranking Semantic Web content
Use cases and applications
Conclusions
22
Rank has its privilege
• Google introduced a new approach to ranking query
results using a simple “popularity” metric.
– It was a big improvement!
• Swoogle ranks its query results also
– When searching for an ontology, class or property,
wouldn’t one want to see the most used ones first?
• Ranking SW content requires different algorithms for
different kinds of SW objects
– For SWDs, SWTs, individuals, “assertions”,
molecules, etc…
UMBC
an Honors University in Maryland
23
Google’s PageRank
• A page’s rank is a function of
how many links point to it and the
rank of the pages hosting those links.
• The “random surfer” model provides
the intuition:
(1) Jump to a random page
(2) Select and follow a random link on the
page and repeat until ‘bored’
(3) If bored, go to (1)
• Ranked pages by the relative
frequency with which they are visited.
UMBC
an Honors University in Maryland
Jump to a
random page
yes
bored?
no
Follow a
random link
24
Ranking Semantic Web Documents
• Target: a pure SW dataset
– Nodes: a collection of online SWDs (330K SWDs, 1.5%
are labeled as ontologies)
– Links: in addition to hyperlinks, term level relations are
generalized into TM, EX, IM.
• Rational surfer model (extension of weighted PageRank)
– Semantic content (term level relations) encoded into links
– rank of node iteratively spread via links
– weight/capacity of link vary according to link semantics
– propagate weight to imported ontologies
• Evaluation
– Method: Compare OntoRank with PageRank for
promoting ontologies even using the same Pure SW
Dataset
UMBC
an Honors University in Maryland
25
An Example
http://www.w3.org/2000/01/rdf-schema
wPR =300
OntoRank =403
TM
TM
http://xmlns.com/wordnet/1.6/
wPR =3
OntoRank =103
EX
http://xmlns.com/foaf/1.0/
TM
wPR =100
OntoRank =100
http://www.cs.umbc.edu/~finin/foaf.rdf
wPR =0.2
UMBC
an Honors University in Maryland
OntoRank =0.2
26
Ontology Dictionary
•
Motivation
– One ontology does not always provide all needed
vocabulary
– There could be many scenario that requires
assembling terms from multiple ontologies
•
DIY ontology engineering
1. Search an appropriate class C
2. Search for popular properties used for modifying C’s
class instance
3. Go back to step 1 if more classes are needed
UMBC
an Honors University in Maryland
27
Ranking Semantic Web Terms
• Pr(Term|Doc) can be measured by the normalized
value of the product of the term’s
– Popularity: how many SWDs is using the term.
– Frequency: how many times the term is used in the SWD
• SWDs are accessed non-uniformly by OntoRank
• TermRank estimates a term’s importance as
∑ Pr(Term|Doc) * OntoRank(Doc)
• Evaluation
– Compare TermRank with Term’s popularity for the top 10
highest rated terms and compose analytical evaluation.
UMBC
an Honors University in Maryland
28
Class-Property Bonds
Class-Property Bond
(introduced by ontology)
• foaf:mbox
• foaf:name
Class-Property Bond
(introduced by instances)
• foaf:name
• dc:title
SWD1
foaf:mbox
foaf:name
Class Definition
• rdfs:subClassOf -- foaf:Agent
• rdfs:label – “Person”
rdfs:domain
rdfs:domain
SWD3
SWD2
rdf:type
rdf:type
owl:Class
foaf:Person
foaf:name
rdfs:subClassOf
“Tim Finin”
dc:title
“Tim’s FOAF File”
UMBC
an Honors University in Maryland
foaf:Agent
rdfs:comment
“a human being”
29
This talk
•
•
•
•
•
•
UMBC
an Honors University in Maryland
Motivation
Swoogle overview
Bots navigate the Semantic Web
Ranking Semantic Web content
Use cases and applications
Conclusions
30
Applications and use cases
• Supporting Semantic Web developers, e.g.,
– Ontology designers
– Vocabulary discovery
– Who’s using my ontologies or data?
– Etc.
• Searching specialized collections, e.g.,
– Proofs in Inference Web
– Text Meaning Representations of news stories in
SemNews
• Supporting SW tools, e.g.,
– Discovering mappings between ontologies
UMBC
an Honors University in Maryland
32
This talk
•
•
•
•
•
•
UMBC
an Honors University in Maryland
Motivation
Swoogle overview
Bots navigate the Semantic Web
Ranking Semantic Web content
Use cases and applications
Conclusions
36
Will it Scale? How?
Here’s a rough estimate of the data in RDF documents on the
semantic web based on Swoogle’s crawling
System/date
Terms
Documents Individuals
Triples
Bytes
Swoogle2
1.5x105
3.5x105
7x106
5x107
7x109
Swoogle3
2x105
7x105
1.5x107
7.5x107
1x1010
2005
2.5x105
5x106
5x107
5x108
5x1010
2008
5x105
5x107
5x108
5x109
5x1011
We think Swoogle’s centralized approach can be made to work
for the next few years if not longer.
UMBC
an Honors University in Maryland
37
How much reasoning?
• SwoogleN (N<=3) does limited reasoning
– It’s expensive
– It’s not clear how much should be done
• More reasoning would benefit many use cases
– e.g., type hierarchy
• Recognizing specialized metadata
– E.g., that ontology A some maps terms from B to C
UMBC
an Honors University in Maryland
38
Conclusion
• The web will contain the world’s knowledge in
forms accessible to people and computers
– We need better ways to discover, index, search and
reason over SW knowledge
• SW search engines address different tasks than
html search engines
– So they require different techniques and APIs
• Swoogle like systems can help create consensus
ontologies and foster best practices
UMBC
an Honors University in Maryland
39
For more information
http://ebiquity.umbc.edu/
Annotated
in OWL
UMBC
an Honors University in Maryland
40