Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman

Download Report

Transcript Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman

Mining the Semantic Web:
Requirements for Machine Learning
Fabio Ciravegna, Sam Chapman
Presented by
Steve Hookway
10/20/05
What is the Semantic Web


A way to automate reasoning with web
data
RDF



A uniform way to describe resources
(subject,predicate,object)
Ontology



Hierarchical structure of data
Property restrictions
Implicit typing
Adding Meta-Data


A prerequisite for Semantic Web
(SW) is structured knowledge
Manual Approach




Too Much data
Trust Issues
Noise
This process needs to be
automated
Armadillo


Automatically annotate web pages
Validity based on a number of weak
techniques




Redundant Information
Rating of Sources
Context around a capture
(LP)² - Extraction of knowledge

Makes use of Natural Language Processing (NLP)
(LP)²

Induce tagging rules




Contextual Tagging


Generalize NLP and keep best rules <tag>
Remove covered instances from pool
High Precision, Low Recall
Recovers rules and constrains their application
</tag>
Correction and Validation


Shifts tags to correct position (within d spaces)
Validation
Heterogeneity

Armadillo



Uses weak NLP
Uses intra-document relation recognition
Requirements


Must adapt to different document types
Relation Extraction
Bootstrapping Learning

Armadillo



Unsupervised approach – user only
validates
User cannot drive system towards
interesting documents and facts
Requirements


Identify triples
Goal: Bootstrap learning on a large scale

User needs a role to guide learning
Content Cleaning and
Normalization

Armadillo



Noise added during unsupervised (LP)²
Use the multiple weak evidence to help
avoid poor seeds
Requirements

Handle noisy training data
Conclusion

Semantic Web


Armadillo – a tool for IE



Meta-Data
Evidence Building and Validation
Extraction of knowledge (LP)²
A survey of requirements in mining web
content for SW meta-data