Online data mining course Chapter 1

Download Report

Transcript Online data mining course Chapter 1

Online data mining course
Chapter 1: Introduction
Hello!
My Name is Rob-But Ler, the Robot-Expert of My-X!
My Boss is here as far as possible...
Till then, you can replay his message:
László Pitlik
University Gödöllő,
Institute of Computer Sciences
Gödöllő, H-2100 Páter K. u. 1.
2008.XII.04.
1
Greetings and introductions
• Welcome by the My-X project
• Apropos: „dress” rehearsal in frame
of the best English courses of the world!
• Briefly about our symbols:
• and our keywords:
sustainability, balance,
equilibrium, consistency
• and finally about myself…
Online data mining course – Chapter 1: Introduction
2
Outline of the presentation
• Aims of this course
• A test-question (in advance and after that too)
• Further questions, to initialize the common thinking
• Theoretical background or near to the heresy?!
• Didactical background (how to learn?)
• List of the course units (what to learn?)
• Summary and conclusions
• One solution of the test-question
Online data mining course – Chapter 1: Introduction
3
Aims of the course
On the basis of previous projects you can follow step by
step,
•
•
•
•
•
•
•
•
how to prepare (e.g. by pivot tables), and
how to manage (s. OLAP) the necessary project databases,
how to define similarity problems
including their controlling aspects and
how to make online and offline analyses and
how to interpret and
to describe the calculated results
(as preferred) in an online expert system.
By the end of the course you will know about each step for
the successful managing of planning, decision making
and forecasting.
Online data mining course – Chapter 1: Introduction
4
Test-Question (in advance)
Please, „match” the following words, fragments and
letters (one letter can be used not only once),
and write a short story or (it is more comfortable for you,
than) some equations
based on the explored antagonisms:
Science
syn, sin, sis
Fusion
con, the
CEIHTY
Online data mining course – Chapter 1: Introduction
5
Initializing the common thinking
• Do you know, whether a prediction should be in general
better for the shorter term or for the longer term?
If possible: Vote ratio by the audience
• Do you know whether an analysis based on more data
records should be more correct?
If possible: Vote ratio by the audience
• Do you know whether an analysis testing through large
amount of cases should be more fit than some other one
without testing?
If possible: Vote ratio by the audience
Online data mining course – Chapter 1: Introduction
6
Theoretical backgrounds OR
near to the heresy?
•
•
•
•
•
•
•
•
A phenomenon can only be labeled SCIENCE in case it can be
transformed into program-codes (e.g. chess-robot).
Each other phenomenon belongs to artistic performance (e.g.
studies, lectures and always this presentation).
The human intuition brings the good ideas. But not only human
intuition seems to exist (cf. K. Lorenz, 1942).
All living creatures on the earth have sensors to measure their
(inside and outside) environment.
The measured values are continuously interpreted in order to find
some connection between causes and reactions.
“Heureka”! – was already cried directly at the beginning of life!
Data mining has to deliver possible connections based on
the measured records.
Therefore we can press our instinctive capability into source
codes.
7
Online data mining course – Chapter 1: Introduction
Didactical background (how to learn)
Sustainable education:
•
Nothing irrelevant to store
•
Strategic planning: consistency-based
•
Operative thinking: market-oriented
Priorities or core knowledge elements:
•
Efficiency through real time analyses
•
Case-Based Reasoning (CBR) logic as core method
•
•
•
Most universal (benchmarking, forecasting / offline, online)
Most adaptable (free to set parameters, no programming)
Competition of methods and searching strategies
•
•
•
Decisions trees
Artificial neural networks
Monte-Carlo Methods (MCM) and genetic algorithms
Online data mining course – Chapter 1: Introduction
8
Learning strategies
and maintance
Learning strategies
and their
maintenance
(source: own calculations)
120
100
knowledge level (%)
80
storage
60
usage
40
20
0
0
50
100
150
time (day)
200
250
300
List of the course units (what to learn)
•
•
•
•
•
•
•
•
•
•
The world can be interpreted in form of Object-Attribute-Matrixes
(OAM)!
Anomalies of the data assets management (Why is the
preparation of an OAM so slow? How to avoid the anomalies?)
Preparing OLAP (online analytical processing) databases (do it
yourself, if nobody wants to make it)
Using OLAP-techniques for OAM (efficiency as the highest priority)
How it is made: Expert system (rules as universal solution)
CBR-pattern (OAM from time series, or in benchmarking, or for
production functions)
Solver (be free offline)
COCO (component based object comparison // be free online)
Interpretations of results (chess-robots for context free situations)
Standard expectations of studies (What you may not do and what
have to do for a good study?)
Online data mining course – Chapter 1: Introduction
10
Summary
• We have defined strategic and operative aims
(deriving from real problems)…
• We have checked, whether we see the same world
around us…
• We would like to teach and learn only the most
necessary competencies…
• We have seen in brief, which competencies we
should combine in order to approximate a real time
speed in the analysis…
Online data mining course – Chapter 1: Introduction
11
Conclusions
• We have data, methods, computers, networks,
problems and unfortunately illogical restrictions in our
General Problem Solving (GPS) strategies
• We have an icon: namely the chess-robot…
* * * therefore * * *
• We should ensure the free access to each datum!
• We should learn from own instincts!
• We have to transform the intuitions into source codes!
• We have to provide the new methods also online!
• We have to teach the people to think instead to serve!
• We can detect the lacks of equilibrium!
• We can correct always the wrong directions!
12
LET US DO THEM!
Thank you very much
for your attention!
Further details:
[email protected]
http://miau.gau.hu/myx-free
13
Pun?!
pros and cons
Science
= Con-science (=TQM or
con-sis-TENCY in thinking)
Syn-the-sis
= Fusion (of each thesis)
Confusion= Sin - the-sis
Sin
<> ETHIC
Syn-the-TIC => Artificial (Intelligence) => Robotics
http://en.wiktionary.org/wiki/conscience
(incl. Etymology aspects)
14