Supporting DM Tasks & DM Processes in DSMS or CEP Systems

Download Report

Transcript Supporting DM Tasks & DM Processes in DSMS or CEP Systems

Supporting DM Tasks & DM Processes
in a DSMS or a CEP System
• Motivation: Gaining experience with current DSMS and their
limitations which make it hard to support KDD applications on
data streams.
• Case Study: Naïve Bayesian Classifiers—arguably the simplest
mining algorithm, which is doable in SQL/DBMS. Thus the
question is: can we support it using a DSMS and its SQL-like
query languages?
• A slightly more general question is whether the NBC can be
supported various CEP systems, which claim to be powerful (e.g.,
support rules). Couldthey be extended to support generic
versions of NBC, and perhaps other data stream mining
methods?
CS240B Project
Download a DSMS or a CEP system of your choice and (after explaining why you have
selected this and not the others) explore how you can implement the following
tasks:
1. Testing of a Naïve Bayesian Classifier: you can assume that the NBC has already
been trained and you can read it from the input, or a DB, a file, or memory.
2. Assume now that you also have a stream of pre-classified samples. Use this to
determine the accuracy of your current classifier, at periodic intervals. Output the
accuracy, and if this falls below a certain threshold execute the next step.
3. Periodically retrain a new NBC from the stream of pre-classified tuples; then use
the newly built classifier to predict the class of unclassified tuples (Step 1).
4. See if you can generalize your software, and e.g., design/develop generic NBCs,
ensemble methods, other classifiers, etc.
It is understood that the limitations of DSMS and CEP systems will probably prevent
you from completing all these tasks (listed in order of increasing difficulty). So, you
should make sure that you (1) download a good system, (2) write clear report
explaining your efforts, and the reasons that prevented you from going further.
(For test sets, see the CS240A project --http://www.cs.ucla.edu/classes/winter14/cs240A/DMproject.html)