Incremental Extraction of Keyterms for Classifying
Download
Report
Transcript Incremental Extraction of Keyterms for Classifying
Applications of Semantic Web
Lin, Shih-Jui and Chien, Lee-Feng
Institute of Information Science
Academia Sinica
Semantic Web and Related Fields
AI
Machine Learning
Language Technology
NLP/IE
Semantic Web
Data Mining
Information
Retrieval
Knowledge
Management
Agent
Web Service
Semantic Web and Related Fields
AI
Machine Learning
Language Technology
NLP/IE
Semantic Web
Data Mining
Information
Retrieval
Knowledge
Management
Agent
Web Service
Building Semantic Web
• Ontology
- Building
repositories of terms and their relationships (LT)
ontology generation (ML)
- Mapping and merging
knowledge of language, terms (LT)
mapping and merging (ML)
• Knowledge base
- Adding instances into KB
structure/content mining (DM)
text analysis and extract values of attributes (NLP, IE, ML)
• Document
- Semantic annotation
association between words and annotations (DM, ML)
Semantic Web and Related Fields
AI
Machine Learning
Language Technology
NLP/IE
Semantic Web
Data Mining
Information
Retrieval
Knowledge
Management
Agent
Web Service
Using Semantic Web
• Language technology
- Text corpora with semantics
• Data mining
- Content/structure mining from semantic web
pages
- Usage mining from user’s activities on semantic
web
Using Semantic Web
• Information retrieval
- Metadata search
- Topic-based search
• Knowledge management
- Acquire, maintain, access knowledge
• Agent technology / web services
- DAML-S
- RETSINA calendar agent
Application I
Information
Retrieval
Information Retrieval
• Current search
• Search on Semantic Web
- Metadata search
Project: HOWLIR
- Topic-based search
Project: TAP
Current Search
• Is keyterm-based search (e.g., Google)
- Full text indexing
- Page authority (link analysis)
- Page popularity (user’s click)
• Problems
- Not specific
Data in pages have no semantic annotations
Yo-yo Ma’s most recent CD
- No topic disambiguation
Documents with different topics mix together
Yo-yo Ma’s CDs, concerts, biography, gossips…,
Information Extraction
• Wrapper
Specific web sites
Structured documents
Heuristic extraction
• Information extraction
Unstructured documents
Natural language analysis
Values for specific attributes
• Problems
- Not flexible
Current web provides little metadata
- No topic disambiguation
XML
• Metadata
- <Person>
<Name> Yo-yo Ma </Name>
<CD>Inspired by Bach</CD>
</Person>
XML (Extensible Markup Language)
Adapted from Dieter Fensel
RDF/RDFS
• Pre-defined modeling primitives
• The base of metadata search
metadata search
RDFS
RDF (Resource Description Framework)
XML (Extensible Markup Language)
Adapted from Dieter Fensel
Ontology
• Sharable specifications of
interesting topics
• The base of topic-based search
topic-based search
metadata search
musician
CD
…
time
concert
…
price
Ontology
RDFS
RDF (Resource Description Framework)
XML (Extensible Markup Language)
Adapted from Dieter Fensel
Search on Semantic Web
• Metadata search
- To increase precision and flexibility
• Topic-based search
- To help contextualize queries and overlay results
in terms of a knowledge base
Metadata Search
• To annotate metadata on documents
(XML/RDF/RDFS)
• To index both full text and metadata
• To retrieve documents according to
both text and metadata (Hybrid IR)
• e.g., HOWLIR IR system (UMBC, John
Hopkins)
HOWLIR
- To extract terms from
documents via AutoTextTM
- To learn metadata by the
statistical associations
between metadata and text
in annotated documents
- To generate annotations in
RDF/DAML
- To retrieve documents
according to text and
metadata
Text
man-built
auto-annotate
NLP/IE/DM/ML
Indexed
text & metadata
query result
Topic-based Search
• To help contextualize queries and
overlay results in terms of a knowledge
base
• E.g. TAP (IBM, Stanford)
TAP
KB
UDDI++
Musician whose genre is ClassicalMusic,
First name is …
Search
Front End
“Yo Yo Ma”
Who has
- concert dates?
- discography?
- auctions?
- bio?
For musician whose
Caching & Buffering
Auctions for …
Bio for …
Concert Dates for
Musician whose …
Discography for …
EBay
CDNow
AllMusic
TicketMaster
TAP KB
• Ontology and instances in specific domains
(music, sport, etc.)
- Manual editing
- Mining free data sources on the Web
- Reading news articles and automatically identifying new
musicians, athletes, etc.
• Currently covers about 20% of queries
• In RDF, DAML+OIL format
• Browse the KB at TAP site
Summary of IR
• Metadata search
- HOWLIR
• Topic-based search
- TAP
Application II
Knowledge
Management
Knowledge Management
•
•
•
•
What is KM?
KM in a company
KM on Semantic Web
Project: Ontoknowledge
What is KM?
• Acquiring knowledge
- Gather
- Organize
• Maintaining knowledge
- Represent
- Update
• Accessing knowledge
- Search
- Visualize/browse
- Share
KM in a Company
• To organize, maintain, and access the
knowledge and experiences effectively
(organization memory)
• To share documents among different
departments
• To reduce the overhead of training
• To reduce the cost of customer services
• To reduce labor force
KM on Semantic Web
• Semantic web provides infrastructure for KM
- Acquiring knowledge:
Ontology building
KB building
- Maintaining knowledge:
Represented in RDF/DAML/OIL
- Accessing knowledge:
Intelligent search
Ontology-based visualization
Ontology-based sharing
Ontoknowledge
• A project developed by
- Academic groups
Free University Amsterdam
University of Karlsruhe
- Companies
British Telecom (call center)
Swiss Life (insurance company)
Enersearch (virtual enterprises)
CognIT, Aidministrator, Ontotext Lab
Architecture of Ontoknowledge
OntoShare
RDF Ferret
Spectacle
RQL
User
Knowledge
Engineer
OntoEdit
OIL-Core
OMM
LINRO
Sesame
OIL-Core ontology repository
acquire
Annotated Data Repository
RDF
pers05
RDF
tel
OntoWrapper
731
par05
Data
Repository
(external)
about
car
OntoExtract
This text is
about cars
even though
ou cant
read it
Manual Ontology Building and
Instantiation
• OntoEdit
- A tool for
building an
ontology
and
instances
manually
Architecture of Ontoknowledge
OntoShare
RDF Ferret
Spectacle
RQL
User
Knowledge
Engineer
OntoEdit
OIL-Core
access
OMM
LINRO
Maintain
Sesame
OIL-Core ontology repository
acquire
Annotated Data Repository
RDF
pers05
RDF
tel
OntoWrapper
731
par05
Data
Repository
(external)
about
car
OntoExtract
This text is
about cars
even though
ou cant
read it
Visualization
• Spectacle: ontology-based knowledge
presentation
Case Studies
• Swiss Life
• British Telecom
Swiss Life
• IAS (International Accounting Standard)
- Searching a large document on the Intranet
OntoExtract
- Learning ontology from documents
- Assisting in reformulating user’s query
Swiss Life
• Management of skills of employees
Annotation of employees’ homepages
- Skills, education, job functions
Ontology of skills
Comparing, querying employees’ skills
- Find out the most experienced employee at fire
insurance for chemistry factories
British Telecom
• CRM (customer relationship management)
- Cost increases 20% every year
OntoShare
- Disseminating customer handling rules and best
practice
- Identifying customers’ problems by
search/browse the ontology
- Keeping track of customer's needs, interests and
preferences
Summary of KM
• Ontology-based KM
- Acquiring knowledge:
Ontology building
KB building
- Maintaining knowledge:
Represented in RDF/DAML/OIL
- Accessing knowledge:
Intelligent search
Ontology-based visualization
Ontology-based sharing
• Ontoknowledge and case studies
Application III
Web Services
Web Services
•
•
•
•
Current web services
Semantic Web services
DAML-S
Project: RETSINA calendar agent
Toward
Int’l Semantic Web Conference
To attend ISWC 2003 in Florida…..
Current Web Services
• A user has to
- Find the services (e.g. by Google)
Find the web sites of hotels and airline
- Composite the services to achieve his
goal
Book tickets and hotels
- Invoke the services
Fill out the forms in each site
- Monitor the execution of services
Is the transaction done?
- Consider his constraints and preferences
Cheaper hotels but better airline
Current Web
Semantic Web Services
• Agent-based technology
• To automate
-
Service discovery
Service invocation
Service selection and composition
Service execution monitoring
User constraints and preferences
Semantic Markup
Service Markup
User Markup
A Framework
DAML-S
Adapted from IEEE Intelligent Systems
DAML-S
• DARPA Agent Markup Language for
Services
• A DAML+OIL ontology/language for
describing properties and capabilities of
web services
• DAML-S Coalition
- CMU, Stanford, Yale, BBN, Nokia, SRI
DAML-S in the Cake
Agent-based
technology
DAML-S (Services)
DAML+OIL (Ontology)
RDFS (RDF Schema)
RDF (Resource Description Framework)
XML (Extensible Markup Language)
Adapted from AAAI
Upper Ontology of Services
Adapted from AAAI
Upper Ontology of Services
Adapted from AAAI
Upper Ontology of Services
Adapted from AAAI
DAML-S / WSDL Grounding
• Web Services Description Language
-
Authored by IBM, Ariba, Microsoft
Focus of W3C Web Services Description WG
Commercial momentum
Specifies message syntax accepted/generated by
communication ports
- Bindings to popular message/transport standards
(SOAP, HTTP, MIME)
- Abstract “types”; extensibility elements
• Complementary with DAML-S
Adapted from AAAI
(Some) Related Work
Related Industrial Initiatives
• UDDI
• ebXML
• WSDL
• .Net
• XLANG
• Biztalk, e-speak, etc
These XML-based initiatives are largely complementary to DAML-S.
DAML-S aims to build on top of these efforts enabling increased expressiveness,
semantics, and inference enabling automation.
Related Academic Efforts
• Process Algebras (e.g., Pi Calculus)
• Process Specification Language (Hoare Logic, PSL)
• Planning Domain Definition Language (PDDL)
• Business Process Modeling (e.g., BMPL)
• OntoWeb Process Modeling Effort
Adapted from AAAI
Tools and Applications
DAML-S is just another DAML+OIL ontology
All the tools & technologies for DAML+OIL are relevant
Some DAML-S Specific Tools and Technologies:
Discovery, Matchmaking, Agent Brokering: CMU, SRI (OAA), Stanford KSL
Automated Web Service Composition: Stanford KSL, BBN/Yale/Kestrel,
CMU, MIT, Nokia, SRI
DAML-S Editor: Stanford KSL, SRI, CMU (profiles), Manchester
Process Modeling Tools & Reasoning: SRI, Stanford KSL
Service Enactment /Simulation: SRI, Stanford KSL
Formal Specification of DAML-S Operational/Execution Semantics:
CMU, Stanford KSL, SRI
Adapted from AAAI
RETSINA
• Multi-agent system
• Developed by Katia Sycara et. al. (CMU)
• http://www.daml.ri.cmu.edu/site/projects/RDFCalendar/
RETSINA Calendar Agents
• Meeting scheduling agents
- Meetings have several properties including:
Time/Duration
Attendee Information
Location
Description
• Functions:
- Allow user to browse schedule and events
- Support meeting scheduling
Agents negotiate possible meeting times based on user’s
schedule and preferences
- Import schedules into MS Outlook
RETSINA Semantic Web
Calendar Agents
• Use RDF to represent schedules and events
- Event concepts can refer to existing concepts on Semantic
web
• Support additional actions based on available
information
- Email or visit web page
• Support agent discovery (DAML-S) to locate
other agents
Services Beyond RETSINA
• Cooperation with other agents on
Semantic web
- Reminding upcoming registration or submission
deadlines
- Booking a flight to a conference
Summary of Web Services
• Semantic web makes it possible
to automate web services by
agent-based technology
Agent-based
Technology
(e.g.RETSINA)
DAML-S (Services)
DAML+OIL (Ontology)
RDFS (RDF Schema)
RDF (Resource Description Framework)
XML (Extensible Markup Language)
Adapted from AAAI
Summary
AI
Machine Learning
Language Technology
NLP/IE
Semantic Web
Data Mining
Information
Retrieval
Knowledge
Management
Agent
Web Service
Summary
AI
Machine Learning
Language Technology
NLP/IE
Semantic Web
Data Mining
Information
Retrieval
Knowledge
Management
Agent
Web Service
Summary
AI
Machine Learning
Language Technology
NLP/IE
Semantic Web
Data Mining
Information
Retrieval
Knowledge
Management
Agent
Web Service
Summary
AI
Machine Learning
Language Technology
NLP/IE
Semantic Web
Data Mining
Information
Retrieval
Knowledge
Management
Agent
Web Service
Metadata search
Topic-based search
Ontology-based KM
Ontoknowledge
DAML-S
RETSINA
Q&A
Thank you!
References
• Introduction to Semantic Web
- http://www.cs.vu.nl/~dieter/ftp/slides/kcap.pdf
• Official sites:
- http://www.w3.org/2001/sw/
- http://www.semanticweb.org/
• DAML-S
- http://www.daml.org/services/
• Projects:
- Ontoknowledge: http://www.semanticweb.org/
- TAP: http://tap.stanford.edu
- RETSINA: http://www.daml.ri.cmu.edu/site/projects/RDFCalendar/
Conferences
• Semantic web
- ISWC (International Semantic Web Conference)
- WWW Conference
• LT
- COLING
• AI
- Ontologies and Semantic Web Workshop (AAAI)
- Language Resources Meets Semantic Web Workshop (AAAI)
• DM
- Semantic Web Mining Workshop (ECML/PKDD)
• KM
- Knowledge Technologies Conference