Intelligent Data Analysis (IDA)

Download Report

Transcript Intelligent Data Analysis (IDA)

Intelligent Data
Analysis
(IDA)
by
Josipa Kern, PhD
Andrija Stampar School of Public Health
Medical School University of Zagreb
Zagreb, Croatia
Interest and Excitement for
Intelligent Data Analysis
Decision making is asking for
information and knowledge
Data processing can give them
Multidimensionality of problems is
looking for methods for adequate and
deep data processing and analysis
Learning Objectives
To understand the concept of the IDA
To meet web-sites and literature on IDA
To meet some tools for IDA
To learn how to use IDA tools and to
validate the IDA results
Performance Objectives
Recognize problems asking for IDA
Preparing data and making analysis
Validating and interpreting results of IDA
IDA is…
… an interdisciplinary study
concerned with the effective
analysis of data;
… used for extracting useful
information from large quantities
of online data; extracting desirable
knowledge or interesting patterns
from existing databases;
IDA or …
Data mining
Knowledge acquisition from data
Genetic algorithm-based rule discovery
Knowledge discovery
Learning classifier system
Machine learning
etc.
IDA gives knowledge …
Knowledge is …
 the distillation of information that has been
collected, classified, organized, integrated,
abstracted and value-added;
 at a level of abstraction higher than the data,
and information on which it is based and can
be used to deduce new information and new
knowledge;
 usually in the context of human expertise
used in solving problems.
Knowledge acquisition …
The process of eliciting, analyzing,
transforming, classifying, organizing and
integrating knowledge and representing
that knowledge in a form that can be
used in a computer system.
Knowledge in a domain can
be expressed as a number
of rules
Rule is …
A formal way of specifying a
recommendation, directive, or
strategy, expressed as "IF premise
THEN conclusion" or "IF condition
THEN action".
How to discover rules
hidden in the data?
Some tools for IDA …
See5 - program for analyzing data and
generating classifiers in the form of
decision trees and/or rule sets.
http://www.rulequest.com
Some tools for IDA …
Cubist - analyzes data and generates
rule-based piecewise linear models –
collections of rules, each with an
associated linear expression for
computing a target value..
http://www.rulequest.com
Some tools for IDA …
ILLM - the tool constructs
classification models in the form of
rules which represent knowledge
about relations hidden in data.
http://dms.irb.hr
Some tools for IDA …
Magnum Opus - finds association
rules providing competitive
advantage by revealing underlying
interactions between factors within
the data.
http://www.rulequest.com
Evaluation of IDA results
Absolute & relative accuracy
Sensitivity & specificity
False positive & false negative
Error rate
Reliability of rules
Etc.
Example of IDA
Illustration of IDA by using See5
See5…application…
application.names - lists the classes to
which cases may belong and the
attributes used to describe each case.
Attributes are of two types: discrete
attributes have a value drawn from a set
of possibilities, and continuous
attributes have numeric values.
See5…application…
application.data - provides information
on the training cases from which See5
will extract patterns.
The entry for each case consists of one
or more lines that give the values for all
attributes.
See5…application…
application.test - provides information
on the test cases (used for evaluation of
results).
The entry for each case consists of one
or more lines that give the values for all
attributes.
See5…application…example…
Epidemiological study (1970-1990)
Sample of examinees died from
cardiovascular diseases during the
period
Question: Did they know they were ill?
1 – they were healthy
2 – they were ill (drug treatment, positive clinical
and laboratory findings)
See5…application…example…
 application.names – example
Goal.
gender:M,F
activity:1,2,3
age: continuous
smoking: No,Yes
…
Goal:1,2
…
See5…application…example…
application.data – example
M,1,59,Yes,0,0,0,0,119,73,103,86,247,87,1
5979,?,?,?,1,73,2.5
M,1,66,Yes,0,0,0,0,132,81,183,239,?,783,1
4403,27221,19153,23187,1,73,2.6
M,1,61,No,0,0,0,0,130,79,148,86,209,115,2
1719,12324,10593,11458,1,74,2.5
… …
See5…application…example…
Results – example
Rule 1: (cover 26)
gender = M
SBP > 111
oil_fat > 2.9
->
class 1 [0.929]
See5…application…example…
Results – example
Rule 4: (cover 14)
smoking = Yes
SBP > 131
glucose > 93
glucose <= 118
oil_fat <= 2.9
-> class 2 [0.938]
See5…application…example…
Results – example
Rule 15: (cover 2)
SBP <= 111
oil_fat > 2.9
-> class 2 [0.750]
See5…application…example…
Results – example
Evaluation on
(199 cases):
(a)
(b)
---- ---107
3
17
72
training data
<-classified as
(a): class 1
(b): class 2
See5…application…example…
Results – example (training set)
Sensitivity=0.97
Specificity=0.81
See5…application…example…
Results – example
Evaluation on test data
(73 cases):
(a)
---43
3
(b)
---1
26
<-classified as
(a): class 1
(b): class 2
See5…application…example…
Results – example (test set)
Sensitivity=0.98
Specificity=0.90
All the suggested IDA tools are
available at mentioned URLs, at
least as demo version
Try your own IDA…
Thank you!