eBusiness Tutorial

Download Report

Transcript eBusiness Tutorial

Datamining in e-Business: Veni, Vidi, Vici!
Prof. Dr. Veljko Milutinovic:
•Coarchitect of the World's first 200MHz RISC microprocessor,
for DARPA, about a decade before Intel.
•Responsible for several successful datamining-oriented e-business on the Internet
products, developed in cooperation with leading industry in the USA and Europe.
•Consulted for a number of high-tech companies
(TechnologyConnect, BioPop, IBM, AT&T, NCR, RCA, Honeywell, Fairchild, etc...)
•Ph.D. from Belgrade. After that, for about a decade, on various positions (professor)
at one of the top 5 (out of about 2000) US universities in computer engineering (Purdue).
•Author and coauthor of about 50 IEEE journal papers (plus many more in other journals).
According to some, a European record for his research field.
•Guest editor for a number of special issues of: Proceedings of the IEEE, IEEE
Transactions on Computers, IEEE Concurrency, IEEE Computer, etc…
•Over 20 books published by the leading USA publishers
(Wiley, Prentice-Hall, North-Holland, Kluwer, IEEE CS Press, etc...).
•Forewords for 7 of his books written by 7 different Nobel Laureates, in cooperation with
Telecom Italia learning services (you are welcome to visit http://www.ssgrr.it).
[email protected]
http://galeb.etf.bg.ac.yu/~vm
Page Number: 1
THIS IS A DEMO VERSION OF
THE TUTORIAL IN DATAMINING FOR E-BUSINESS
ONLY A FEW SLIDES OF THE ORIGINAL TUTORIAL
ARE PRESENTED HERE
Page Number: 2
Focus of this Presentation

Data Mining problem types

Data Mining models and algorithms

Efficient Data Mining

Available software
Page Number: 3
Decision Trees
Balance>10
Balance<=10
Age<=32
Married=NO
Age>32
Married=YES
Page Number: 4
Decision Trees
Page Number: 5
Rule Induction

Method of deriving a set of rules
to classify cases

Creates independent rules
that are unlikely to form a tree

Rules may not cover
all possible situations

Rules may sometimes
conflict in a prediction
Page Number: 6
Comparison of fourteen DM tools





Evaluated by four undergraduates inexperienced at data mining,
a relatively experienced graduate student, and
a professional data mining consultant
Run under the MS Windows 95, MS Windows NT,
Macintosh System 7.5
Use one of the four technologies:
Decision Trees, Rule Inductions, Neural, or Polynomial Networks
Solve two binary classification problems:
multi-class classification and noiseless estimation problem
Price from 75$ to 25.000$
Page Number: 7
Comparison of fourteen DM tools




The Decision Tree products were
- CART
- Scenario
- See5
- S-Plus
The Rule Induction tools were
- WizWhy
- DataMind
- DMSK
Neural Networks were built from three programs
- NeuroShell2
- PcOLPARS
- PRW
The Polynomial Network tools were
- ModelQuest Expert
- Gnosis
- a module of NeuroShell2
- KnowledgeMiner
Page Number: 8
Criteria for evaluating DM tools
A list of 20 criteria for evaluating DM tools, put into 4 categories:

Capability measures what a desktop tool can do,
and how well it does it
- Handles missing data
- Considers misclassification costs
- Allows data transformations
- Includes quality of tesing options
- Has a programming language
- Provides useful output reports
- Provides visualisation
Page Number: 9
-
Criteria for evaluating DM tools

Interoperability shows a tool’s ability to interface
with other computer applications
- Importing data
- Exporting data
- Links to other applications

Flexibility
- Model adjustment flexibility
- Customizable work enviroment
- Ability to write or change code
Page Number: 10