Dong-Kyu Jeon

Download Report

Transcript Dong-Kyu Jeon

인공지능 특강 프로젝트
- Development of Decision Tree Algorithm for Semantic Web data -
2010313148
전동규
Agenda
1. Project Purpose
2. Motivation
3. Related Work
4. Algorithm
5. Description of Problem
2
1. Project Purpose
Relational Data
- Semantic Web Data (Linked data) -
Semantic Decision Tree
Decision Tree Algorithm
- C4.5 Algorithm -
3
1. Project Purpose
• Goal of project
 Development of new kind of Decision Tree algorithm which supports
decision making based on Semantic Web environmental information
 Solve the several problems which is already solved by other related
researches
• Data : Linked Data(Semantic Web ontology)
4
1. Project Purpose
• Ontology
 Class : Definition of Set
 Property : Relations between instances
 Instance : Individuals which are belonged in classes
String
Int
Boolean
String
name
Rich
age
name
Workplace
Location
String
Hospital
Person
School
Doctor
Person
Class
Teacher
Student
Range type
Person
Datatype Property
Object Property
Schema of Example Ontology
5
2. Motivation
What is Semantic Web ?
• Definition of Semantic Web
 The Semantic Web is an evolving development of the World Wide Web in which the
meaning (semantics) of information and services on the web is defined, making it
possible for the web to understand and satisfy the requests of people and machines to
use the web content
 Semantic Web is based on ontologies which corresponds to Semantic Web data
• Linked Data
 The term Linked Data is used to describe a method of exposing, sharing, and connecting
data on the Web
[1] Berners-Lee, T. (2001), “ The Semantic Web,” Scientific American, Vol. 501.
6
2. Motivation
Development status of Semantic Web
• Increase of Semantic Web Data
 Appearance of Semantic Web Document Search Engines
 Falcons : Twenty millions over RDF/XML Documents
 Swoogle : Three millions over Semantic Web Documents
Date
RDF/XML
Quadruple
2009-09-02
21,639,337
2,936,868,638
2009-05-29
19,919,364
2,177,084,709
Date
Semantic Web Document
Triple
2009-10-17
3,109,616
1,065,799,526
Falcons
Swoogle
 Open data in Semantic Web
 LINKINGOPENDATA :
 The goal of the W3C SWEO Linking Open Data community project is to
extend the Web with a data commons by publishing various open data
sets as RDF on the Web and by setting RDF links between data items from
different data sources
 The data sets consist of over 4.7 billion RDF triples, which are interlinked
by around 142 million RDF links (May 2009).
7
2. Motivation
Development status of Semantic Web
• Increase of Semantic Web Data
 Semantic Web based Portal Site
 Twine :
 Twine is a Semantic Web Portal that making networks of information based
on user’s posts which consist of their own information and favorite things.
 Every information composing Twine is written in RDF and OWL format.
 Twine have millions of visitors in a month, and they have over millions of
relationships between 3 millions of semantic tags (March 2009)
The necessity of mining useful knowledge from huge size
ontology is highly expected. Therefore, Data Mining
methodology for Semantic Web should be ready for this
necessity.
8
2. Motivation
Decision Tree in Semantic Web
• Traditional Decision Tree algorithm is impossible to apply in Semantic Web
 Semantic Web based Ontology has special characteristics for mining
 Since Semantic Web document has network structure, multi-value issue is
occurred
 Traditional Decision Tree just uses value of variables. Therefore, additional
information of Semantic Web are can not be applied
 Converting Semantic Web data into single table style that used to use in
traditional decision tree algorithm is impossible
9
3. Related Work
• Arno J.Knobbe[2] developed decision tree algorithm for Multi-relational
database
 Selection Graph is suggested to do decision tree on RDB
 Selection Graph is composed of Node, Edge, and condition and it can be expressed
in SQL syntax
This research suggested partial solution about multi-value issue which also
happened in Semantic Web ontology. However, this methodology can not be
applied to Semantic Web which contains a lot of information than RDB
• David Jensen[3] suggested methodology that converting social network data
to single table data which can be applied to Traditional Decision Tree
algorithm
 ‘QGraph' that kind of query language to get the local network from entire social
network is suggested
 QGraph is composed of Node, Edge, and condition and it can query many objects at
once
Since ontology information are manually converted to single table form, missing
information will be occurred a lot
[2] Arno J. Knobbe., Arno Siebes., Danil Van Der Wallen., Syllogic B. V. (1999). “Multi-Relational Decision Tree Induction,” In Proceedings of PKDD’99,
[3] D. Jensen., and J. Neville.(to appear) (2002). “Data mining in networks,” Papers of the Symposium on Dynamic Social Network Modeling and Analysis.
10
4. Algorithm
• Search procedure of algorithm follows C4.5 algorithm
• New methodologies are required to learn concepts in ontology
• ‘Constructor’ can be used as similar as attributes in traditional Decision tree
• Related works used the terms ‘Refinement’ as an attribute in Decision Tree
11
4. Algorithm
Refinements
• What is a Refinement?
 Refinement is a condition for split branches in decision tree. In this algorithm, property
and class from ontology are used as a refinement.
 When define a refinement, Role Constructors from Description Logic are applied to
make the best use of information in Semantic Web
• Type of Refinements
 Concept Constructor Refinement : Applying type information of instances
 Cardinality restriction Refinement : Applying cardinality information on object property
 Domain restriction Refinement : Applying value of datatype property
 Qualification Refinement : Applying information of quantification restrictions and range
class of object property
12
4. Algorithm
Refinements
• Developed Refinements
Example
Concept Constructor
Refinement
Hospital
not Human
Domain restriction
Refinement
Age.(≥ 21)
Cardinality restriction
Refinement
≥ 3 hasChild
Qualification
Refinement
 hasChild.Blond
13
4. Algorithm
• The list of syntax information which can be expressed in ontology
Language
Syntax
RDF
rdf:type
owl:equivalentProperty
rdfs:domain
owl:FunctionalProperty
rdfs:range
owl:hasValue
rdfs:subClassOf
owl:intersectionOf
rdfs:subPropertyOf
owl:InverseFunctionalProperty
owl:AllDifferent
owl:inverseOf
owl:allValuesFrom
owl:masCardinality
RDFS
owl:cardinality
OWL
Language
OWL
Syntax
owl:minCardinality
owl:Class
owl:ObjectProperty
owl:complementOf
owl:oneOf
owl:DatatypeProperty
owl:sameAs
owl:DataRange
owl:someValuesFrom
owl:differentFrom
owl:SymmetricProperty
owl:disjointWith
owl:TransitiveProperty
owl:equivalentClass
owl:unionOf
14
5. Description of Problem
Train problem
Ten trains
• After learning, found definition of eastbound train is as follows
15
5. Description of Problem
Train problem
• Artificial task of learning to predict whether a train is headed east or west
• Data is consist of relation tuples
• Relations
 eastbound(T) : train T is eastbound
 has-car(T,C) : C is a car of T
 infront(C,D) : car C is in front of D
 long(C) : car C is long
 open-rectangle(C) : car C is shaped as an open rectangle similar relations for five other shapes
 jagged-top(C) : C has a jagged top
 sloping-top(C) : C has a sloping top
 open-top(C) : C is open
 contains-load(C,L) : C contains load L
 1-item(C) : C has one load item similar relations for two and three load items
 2-wheels(C) : C has two wheels
 3-wheels(C) : C has three wheels
16