CS652 Spring 2004 Summary
Download
Report
Transcript CS652 Spring 2004 Summary
CS652 Spring 2004
Summary
Course Objectives
Learn how to extract, structure, and
integrate Web information
Learn what the Semantic Web is
Learn how to build ontologies for the
Semantic Web
Investigate class-related research
topics
Be introduced to Semantic Web
services
Generally Applicable Ideas
Semantic Understanding
Data: attribute-value pairs
Information: data in a conceptual model
Knowledge: information with agreement
Meaning: useful knowledge
Measuring Success
Recall: NrCorrect/TotalCorrect
Precision: NrCorrect/(NrCorrect+NrIncorrect)
F-measure: (β2+1)PR/(β2P+R)
Information Extraction
Get relevant information
Not:
Information retrieval: get relevant pages
Web mining: discover unknown associations
Wrapper: maps data to a suitable format
Generation techniques
Machine learning (e.g. RAPIER)
Natural language processing (e.g. RAPIER)
Hidden Markov Models
By-example generation tools (e.g. Lixto)
By-pattern generation (e.g. RoadRunner)
Wrapper Maintenance
Information Extraction – BYU Ontos
Ontology-based
Data frames
Strengths
Resilient to page changes
Robust across sites within the same domain
Works well with all types of data-rich text
Hand-crafted ontologies and data frames
Requires record-boundary recognition
Does not learn
Extraction
High-precision classification
Schema mapping
Semantic Web annotation
Agent communication
Ontology generation
Weaknesses
Applications
Semantic Web
Tim Berners-Lee
“information [has a] well-defined meaning”
“[enables] computers and people to work in
cooperation”
Adds context and structure via metadata
Agent computing paradigm
Knowledge markup; semantic annotation
Ontologies
“a formal, explicit specification of a
shared conceptualization” [Gruber93]
Formal: machine readable; FOL
Explicit: concepts and constraints
explicitly defined
Shared: community accepted
Conceptualization: abstract model (OSM)
“shared vocabulary”
Ontology Formalism
Ontology O = <V, A> where
V = vocabulary = predicate symbols (each with some arity)
A = axioms = formulas (constraints and rules)
1:*
owns
1:2
Owner
Vehicle
Car
Truck
Predicates:
Owner(x), Vehicle(x), Car(x), Truck(x), Owner(x) owns Vehicle(y)
Formulas:
x(Car(x)Truck(x) Vehicle(x))
x(Owner(x) 1y(Owner(x) owns Vehicle(y))
Inference Rules:
TruckOwner(x) :- Owner(x), Owner(x) owns Vehicle(y), Truck(y)
Semantic Web Ontologies
RDF
DAML+OIL
OWL
Semantic Web Annotation
with BYU Ontos
BYU Ontos Extraction Ontology
OWL Ontology
osm.cs.byu.edu/CS652s04/ontologies/OWL/carads.owl
Annotated Semantic Web Page
osm.cs.byu.edu/CS652s04/ontologies/annotatedPages/carSrch1_semweb.html
Ontology Generation for the
Semantic Web
Necessary for the Semantic Web
Ontology engineering
Tools
Methodology
Languages (e.g. SHOE, OWL)
Semiautomatic generation
NLP + machine learning (e.g. OntoText)
Create from dictionary or lexicon (e.g. Doddle)
Generation from tables (e.g. TANGO)
Ontology maintenance
Ontology Libraries for the
Semantic Web
Locating ontologies
Indexing and organization
Search mechanisms
Reusing ontologies
Find one and modify
Find several, merge and modify
Ontology Mapping, Merging, and
Integration for the Semantic Web
Ontology reuse
Heterogeneous agent communication
Agent commitment to a new ontology
On the fly: map, merge, integrate (nontrivial to
automate)
Can we do well enough?
Can we synergistically involve a user?
Information extraction wrt target
Table extraction (BYU Ontos)
Semiautomatic wrapper/mediator construction
by automatically providing mappings
Schema Mapping
Schema-level matchers
Name matchers (dictionaries – WordNet)
Structural context matchers
Instance-level matchers
Value characteristics
Data-frame matchers
Mapping cardinality
1:1 (direct)
1:n, n:1, n:m (indirect, complex)
Multi-faceted mapping techniques
Schema Integration
FCA merge using lattices
Global as View (GAV)
Global mediator relations are views over source relations
Dynamic mediator schema – changes to accommodate new
sources (hard to add new sources)
Query only requires view unfolding
Good for static, centralized systems
TSIMMIS
Local source relations are views over mediator relations
Local as View (LAV)
Fixed mediator schema – new sources identify components
covered (easy to add new sources)
Complex query rewriting
Good for dynamic, distributed systems
Information Manifold
What is your dream for the
Semantic Web?
Intelligent personal agents that can:
Gather (just) the information we want and
deliver it to us when we want it
Help us with scheduling
Help us buy the goods we want
Negotiate and conduct business for us
…
Intelligent business agents
Intelligent discovery agents
…
What can you do to make your dreams come true?