Transcript Week 11

Advanced Topics in Data Mining and Research Directions
CSE5610 Intelligent
Software Systems
Semester 1, 2006
www.monash.edu.au
Outline
• Mining Different Data Types
– Spatial, Temporal, Time Series, Data Streams,
Multimedia, XML, Web, Text etc.
• Distributed Data Mining (DDM)
• Mobile & Ubiquitous Data Mining (UDM)
• Data Mining E-Services
• Anytime, Anywhere Data Mining E-Services
www.monash.edu.au
2
Generations of Data Mining
• Four Generations of Data Mining Systems – Robert
Grossman
•
First Generation
– Stand Alone, Centralised, Single Algorithm
•
Second Generation
– Integration with databases, support for high-dimensionality,
complex data types
•
Third Generation
– Distribution and Heterogeniety
•
Fourth Generation
– Support for mining embedded, mobile and ubiquitous data
sources
www.monash.edu.au
3
Distributed Data Mining
www.monash.edu.au
Distributed Data Mining
• Inherently distributed data
• MNC + Global Markets
• => Physical/geographical separation of
users from the data sources
• Traditional data mining model involving
the co-location of users, data and
computational resources is inadequate
www.monash.edu.au
5
Distributed Data Mining (DDM)
• The inherent distribution of data and other
resources as a result of organisations being
distributed.
• The large volumes of data, the transfer of which
results in exorbitant communication costs.
• The need to mine heterogeneous data, the
integration of which is both non-trivial and
expensive.
• The performance and scalability bottle necks of
data mining.
www.monash.edu.au
6
Distributed Data Mining (DDM)
• DDM = Data Mining (DM) + Knowledge
Integration (KI)
• DM - Performing traditional knowledge
discovery at each distributed data site.
• KI - Merging the results generated from
the individual sites into a body of
cohesive and unified knowledge.
www.monash.edu.au
7
Parallel Data Mining (PDM)
• Principal distinction between DDM &
Parallel DM
– parallel mining involves parallel processors
with or without shared memory
• Parallel data mining also includes
development of parallel versions of
traditional data mining techniques.
• Can be integration – DecisionCentre
www.monash.edu.au
8
DDM – Algorithms & Architectures
• Research in distributed data mining can be
divided into two broad categories [Fu01]:
• Data Mining Algorithms.
– focus on efficient techniques for knowledge
integration.
• Distributed Data Mining Architectures.
– focus on development of distributed data mining
architectures
– emphasizes the processes and technologies that
support construction of software systems to perform
distributed data mining
www.monash.edu.au
9
Taxonomy of DDM Architectures
Distributed Data
Mining Systems
Architectures
Client-Server
Agents
Stationary
Self-directed
migration
Mobile
www.monash.edu.au
10
Classification – DDM Systems
DDM Architectural Models
Client-server
Agents
 Mobile Agent
 Stationary Agent
DDM Systems
DecisionCentre [CDG99],
IntelliMiner [PaS99, PaS01],
InterAct [PaD02]
JAM [SPT97], Infosleuth
[UMG98, MUU99], BODHI
[KPH99], Papyrus [Ram98],
PADMA [KHS97a, KHS97b]
www.monash.edu.au
11
Client-Server DDM
Laptop
Data PC
Workstation
Mining
Results
User
Data Mining
Request
Data Mining Sever
Data
Server 1
Data
Transfer
Data
Server 2
www.monash.edu.au
12
Mobile Agent Model for DDM
USERS
PC
Workstation
Laptop
Task Controlling Agent
Knowledge
Integration Agent
Data Mining
Result Agent
Agent System
Data Mining
Result Agent
Directory
Service
Data Mining Agents
Data Resource Agents
Data
Server 1
Data
Server 1
www.monash.edu.au
13
Hybrid Model for DDM
Agent Centre
DDM Server
Optimiser
Agent
Agent
ClientServer
Data
Source 1
Data
Source2
Data
Source n
www.monash.edu.au
14
Ubiquitous Data Mining
www.monash.edu.au
Ubiquitous Data Mining (UDM)
• Mining data in a resource-constrained environment to
support the time critical information needs of mobile
users
• Typical Characteristics
– Mobile User – frequent disconnections
– Handheld Device > Resource constraints – memory, battery, processor,
screen real-estate
– Time critical
– Real-time & On-line
– Data Streams
• Example Scenarios
• Many Challenges
www.monash.edu.au
16
Current Research
• Kargupta’s Group
– MobiMine
• @CSSE, Monash Univ.
– AgentUDM
– Adapative, Cost-efficient & Light-weight data
mining techniques for data streams
> Mohamed Medhat 
> LWC, LWF & LWClass
> Watch this space!!!
www.monash.edu.au
17
Data Mining E-Services
www.monash.edu.au
Data Mining E-Services
•
“…data analysis and mining
functions themselves will be offered
as business intelligence e-services that
accept operational data from clients
and return models or rules”
Umesh Dayal, 2001
•Why?
– Knowledge is a key resource
– Cost of data mining infrastructure
www.monash.edu.au
19
Data Mining E-Services
• Current Commercial Landscape
– Several ASPs -> DigiMine, Information Discovery,
WhiteCross Systems, ListAnalyst.com etc. etc.
– Mode of Operation
• Hybrid Model & Data Mining ASPs
– Optimise Response Time
> Leads to improved throughput
– QoS Estimation
– Location Preferences of Clients
www.monash.edu.au
20
Data Mining E-Services
• Current Commercial Landscape
– Several ASPs -> DigiMine, Information Discovery,
WhiteCross Systems, ListAnalyst.com etc. etc.
– Mode of Operation
• Hybrid Model & Data Mining ASPs
– Optimise Response Time
> Leads to improved throughput
– QoS Estimation
– Location Preferences of Clients
www.monash.edu.au
21
Anytime, Anywhere Data Mining E-Services
www.monash.edu.au
My Thoughts
• Data is a commodity, Analysis is a service
• Access anytime, anywhere
• By anyone…
– From large corporations to small business to
individuals
• From home buyers to mobile salespersons to
grocery shoppers…
www.monash.edu.au
23
My Thoughts
• A preliminary model for delivery
– Datacentric Grids
Compute New
Model Request
+
User Data
Compute New
Model Request
+ User Data +
User
Computation
Private
Datacentric
Grid
Compute New
Model Request
+
Remote User
Data
Compute New
Model
Request + User
Computation
Model
Repository
Mining
Algorithms
Compute New
Model
Request
Data
1
Model Query
Data
2
Data
n
Data Repository
Mobile Agent
Management
System
High Performance Servers
Datacentric Grid Management Module
www.monash.edu.au
24
References
www.monash.edu.au
References
• http://www.csse.monash.edu.au/projects/
MobileComponents/projects/dame/
• http://www.csse.monash.edu.au/~shonali/
research.html
• http://www.csee.umbc.edu/~hillol/DDMBIB
/
• http://www.csee.umbc.edu/~hillol/diadic.h
tml
• http://www.csse.monash.edu.au/~mgaber/
main.html
www.monash.edu.au
26