Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman
Download
Report
Transcript Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman
Mining the Semantic Web:
Requirements for Machine Learning
Fabio Ciravegna, Sam Chapman
Presented by
Steve Hookway
10/20/05
What is the Semantic Web
A way to automate reasoning with web
data
RDF
A uniform way to describe resources
(subject,predicate,object)
Ontology
Hierarchical structure of data
Property restrictions
Implicit typing
Adding Meta-Data
A prerequisite for Semantic Web
(SW) is structured knowledge
Manual Approach
Too Much data
Trust Issues
Noise
This process needs to be
automated
Armadillo
Automatically annotate web pages
Validity based on a number of weak
techniques
Redundant Information
Rating of Sources
Context around a capture
(LP)² - Extraction of knowledge
Makes use of Natural Language Processing (NLP)
(LP)²
Induce tagging rules
Contextual Tagging
Generalize NLP and keep best rules <tag>
Remove covered instances from pool
High Precision, Low Recall
Recovers rules and constrains their application
</tag>
Correction and Validation
Shifts tags to correct position (within d spaces)
Validation
Heterogeneity
Armadillo
Uses weak NLP
Uses intra-document relation recognition
Requirements
Must adapt to different document types
Relation Extraction
Bootstrapping Learning
Armadillo
Unsupervised approach – user only
validates
User cannot drive system towards
interesting documents and facts
Requirements
Identify triples
Goal: Bootstrap learning on a large scale
User needs a role to guide learning
Content Cleaning and
Normalization
Armadillo
Noise added during unsupervised (LP)²
Use the multiple weak evidence to help
avoid poor seeds
Requirements
Handle noisy training data
Conclusion
Semantic Web
Armadillo – a tool for IE
Meta-Data
Evidence Building and Validation
Extraction of knowledge (LP)²
A survey of requirements in mining web
content for SW meta-data