CS652 Spring 2004 Summary

Download Report

Transcript CS652 Spring 2004 Summary

CS652 Spring 2004
Summary
Course Objectives
 Learn how to extract, structure, and
integrate Web information
 Learn what the Semantic Web is
 Learn how to build ontologies for the
Semantic Web
 Investigate class-related research
topics
 Be introduced to Semantic Web
services
Generally Applicable Ideas
 Semantic Understanding




Data: attribute-value pairs
Information: data in a conceptual model
Knowledge: information with agreement
Meaning: useful knowledge
 Measuring Success
 Recall: NrCorrect/TotalCorrect
 Precision: NrCorrect/(NrCorrect+NrIncorrect)
 F-measure: (β2+1)PR/(β2P+R)
Information Extraction
 Get relevant information
 Not:
 Information retrieval: get relevant pages
 Web mining: discover unknown associations
 Wrapper: maps data to a suitable format
 Generation techniques
 Machine learning (e.g. RAPIER)
 Natural language processing (e.g. RAPIER)
 Hidden Markov Models
 By-example generation tools (e.g. Lixto)
 By-pattern generation (e.g. RoadRunner)
 Wrapper Maintenance
Information Extraction – BYU Ontos





Ontology-based
Data frames
Strengths



Resilient to page changes
Robust across sites within the same domain
Works well with all types of data-rich text



Hand-crafted ontologies and data frames
Requires record-boundary recognition
Does not learn






Extraction
High-precision classification
Schema mapping
Semantic Web annotation
Agent communication
Ontology generation
Weaknesses
Applications
Semantic Web
 Tim Berners-Lee
 “information [has a] well-defined meaning”
 “[enables] computers and people to work in
cooperation”
 Adds context and structure via metadata
 Agent computing paradigm
 Knowledge markup; semantic annotation
Ontologies
 “a formal, explicit specification of a
shared conceptualization” [Gruber93]
 Formal: machine readable; FOL
 Explicit: concepts and constraints
explicitly defined
 Shared: community accepted
 Conceptualization: abstract model (OSM)
 “shared vocabulary”
Ontology Formalism
Ontology O = <V, A> where
V = vocabulary = predicate symbols (each with some arity)
A = axioms = formulas (constraints and rules)
1:*
owns
1:2
Owner
Vehicle
Car
Truck
Predicates:
Owner(x), Vehicle(x), Car(x), Truck(x), Owner(x) owns Vehicle(y)
Formulas:
x(Car(x)Truck(x)  Vehicle(x))
x(Owner(x)  1y(Owner(x) owns Vehicle(y))
Inference Rules:
TruckOwner(x) :- Owner(x), Owner(x) owns Vehicle(y), Truck(y)
Semantic Web Ontologies
 RDF
 DAML+OIL
 OWL
Semantic Web Annotation
with BYU Ontos
BYU Ontos Extraction Ontology
OWL Ontology
osm.cs.byu.edu/CS652s04/ontologies/OWL/carads.owl
Annotated Semantic Web Page
osm.cs.byu.edu/CS652s04/ontologies/annotatedPages/carSrch1_semweb.html
Ontology Generation for the
Semantic Web
 Necessary for the Semantic Web
 Ontology engineering
 Tools
 Methodology
 Languages (e.g. SHOE, OWL)
 Semiautomatic generation
 NLP + machine learning (e.g. OntoText)
 Create from dictionary or lexicon (e.g. Doddle)
 Generation from tables (e.g. TANGO)
 Ontology maintenance
Ontology Libraries for the
Semantic Web
 Locating ontologies
 Indexing and organization
 Search mechanisms
 Reusing ontologies
 Find one and modify
 Find several, merge and modify
Ontology Mapping, Merging, and
Integration for the Semantic Web
 Ontology reuse
 Heterogeneous agent communication
 Agent commitment to a new ontology
 On the fly: map, merge, integrate (nontrivial to
automate)
 Can we do well enough?
 Can we synergistically involve a user?
 Information extraction wrt target
 Table extraction (BYU Ontos)
 Semiautomatic wrapper/mediator construction
by automatically providing mappings
Schema Mapping
 Schema-level matchers
 Name matchers (dictionaries – WordNet)
 Structural context matchers
 Instance-level matchers
 Value characteristics
 Data-frame matchers
 Mapping cardinality
 1:1 (direct)
 1:n, n:1, n:m (indirect, complex)
 Multi-faceted mapping techniques
Schema Integration


FCA merge using lattices
Global as View (GAV)

Global mediator relations are views over source relations




Dynamic mediator schema – changes to accommodate new
sources (hard to add new sources)
Query only requires view unfolding
Good for static, centralized systems

TSIMMIS

Local source relations are views over mediator relations
Local as View (LAV)




Fixed mediator schema – new sources identify components
covered (easy to add new sources)
Complex query rewriting
Good for dynamic, distributed systems
Information Manifold
What is your dream for the
Semantic Web?
 Intelligent personal agents that can:
 Gather (just) the information we want and
deliver it to us when we want it
 Help us with scheduling
 Help us buy the goods we want
 Negotiate and conduct business for us
 …
 Intelligent business agents
 Intelligent discovery agents
 …
What can you do to make your dreams come true?