Multi-Relational Data Mining: An Introduction
Download
Report
Transcript Multi-Relational Data Mining: An Introduction
Multi-Relational Data
Mining: An
Introduction
Joe Paulowskey
Overview
Introduction to Data Mining
Relational
Data
Patterns
Inductive Logic Programming (ILP)
Relational Association Rules
Relational Decision Trees
Relation Distance-Based Approaches
Relation Data
Relational Database
Multiple
Defined
Views
Tables
Tables
Relational Pattern
Multiple Relations from a relational
database
More
Expressive
Opens up
Classification
Association
Regression
Relational Pattern (Cont.)
Expressed in Subsets of First Order Logic
Data Mining
Look for patterns in data
What do you discover?
Associations
Sequences
Classifications
Goals of Data Mining
Predict
Identify
Classify
Optimize
Uses
Business Data
Environmental/Traffic
Engineering
Web Mining
Drug Design
Data Mining:
Relational Databases
Most Data Mining approaches deal with
single tables
Not
safe to merge multiple tables into one
single table
Number of patterns increases
Explicit
constraints required
Inductive Logic Programming (ILP)
Logic Programs used to find patterns
Clauses
Head
and Body
Literals
Types
Definite
Program
ILP (Cont)
Predicate
Relations
in relational database
Arguments -> Attributes
Attributes are Typed
Database Clauses are typed program
clauses
Deductive Database
Relational Rule Induction ILP
Learn logical definitions of relations
Classification
Rules
can be found by decision trees
Simple Algorithm
Dealing with noisy/incomplete data
ILP Problems to Propositional
Forms
Propositional
attribute-value
Use Single Table Data Mining algorithms
LINUS
Background
Knowledge
ILP/RDM Algorithms
Share
Learning
as a Search Paradigm
Differences
Representation
of Data, Patterns
Refinement operators
Testing Coverage
Upgrading from Propositional to Relational
Relational Association Rules
Frequent Patterns
Determining
Frequency
Itemsets
Association Rules
Obtained
by frequent itemsets
Relational Decision Trees
Used for Prediction
Binary Trees
First Order Decision List
Relational Distance-Based
Approaches
Calculated distance between two objects
Statistical Approaches
Conclusion