Intelligent Internet Agents for Distributed Data Mining

Download Report

Transcript Intelligent Internet Agents for Distributed Data Mining

Intelligent Internet Agents for
Distributed Data Mining
{yzhang, sowen, sprasad, raj}@cs.gsu.edu
[email protected]
Yanqing Zhang, Scott Owen, Sushil Prasad and Raj Sunderraman
Department of Computer Science
Georgia State University
George Vachtsevanos
School of Electrical and Computer Engineering
Georgia Institute of Technology
Outline
•
•
•
•
•
•
•
•
Motivation
Architecture of Intelligent Internet Agents
Program Libraries of Intelligent Middleware
Smart Web Search Agents
Intelligent Soft Computing Agents
Benefits
Deliverables
Conclusion
Motivation
• Distributed Web KDD: Useful information and
knowledge mined in distributed Web databases
• QoS (Efficiency, Web Speed, User Time) : Huge
amounts of useless data flow on the Internet
• From Data Web to Information Web: Upgrade a
current data-flow-oriented Internet to a future
information-flow-oriented Internet
• Intelligent Web Middleware: with reusable,
portable and scalable intelligent functionality
• Smart E-Business: Use intelligent Web agents to
do better E-Business on the Internet
Architecture of Intelligent Internet Agents
Application Layer: E-Commerce, E-Education, other E-B
Intelligent Layer: Data Mining, Soft Computing, ES, etc
Network Layer: Backbone, gigaPoPs, other hardware
Program Libraries of Intelligent Middleware
1. Binary Association Rule Generator
2. Fuzzy Association Rule Generator
3. Neural-Net-based Data Classifier and Pattern Generator
4. Fuzzy c-means Program for Data Clustering
5. Genetic Algorithms for Data Refinement and Optimization
6. Granular Neural Nets for Linguistic Data Mining
7. XML-based Smart Web Search Sub-Programs
8. Connection Programs between Database and Middle Layer
9. Local Cache Database Manager
10. Local Cache Informationbase Manager
11. Basic GUI Programs
12. Client-Server Creation and Communication Programs
13. Distributed Operation Manager
14. Distributed Data Mining Synchronization,
15. Web Customer Log Miner, .….. , and so on.
Smart Web Search Agents
• Data Search Engines >> Information Search Agents
- Traditional searching on the Web is done using one
of the following three:
- Directories (Yahoo, Lycos, etc)
- Search Engines (AltaVista, NorthernLight, etc)
- Metasearch Engines (MetaCrawler,
SavvySearch, AskJeeves, etc)
All of these involve keyword searches;
Drawback: not easily personalized,
too many results (although many give
relevancy factors)
- Smart Search Agents will provide
- more personalized searches
- domain-based search,
- more efficient searches
Smart Search Agents will employ
- local cache databases (containing
frequently asked queries/results;
possibly updated periodically - nightly!)
- local cache information base (containing
mined information and discovered
knowledge for efficient personal use)
- domain-based agents (e.g. Job Search;
Sports-NBA Stats, Bibliography-Digital
Libraries)
Some initial results:
•
M. Nagarajan, Metagenie - A metasearch engine for
multi-databases, M.S. thesis, GSU (July 1999)
Domains: Jobs, Books
• S. Ahmed, EXACT-FINDER: A cache-based meta-search
engine, M.S. thesis, GSU (May 2000)
Local cache database storing personalized frequently
asked queries and results, updated periodically
• R. Sunderraman, ReQueSS: Relational Querying of semistructured data, ICDE 2000 (demo session), San Diego,
CA, March 2000.
• X. Li, Querying unified sources of Web data, M.S. thesis,
GSU (July 1999)
Data wrappers for Web sources (NBA stats/box scores,
DBLP Bibliography database)
Intelligent Tools for E-Business
• Computational Intelligence, Neural Networks,
Fuzzy Logic, Genetic Algorithms, Hybrid Systems
• Learning Algorithms, Heuristic Searching
• Data Analysis and Modeling, Data Fusion and
Mining, Knowledge Discovery
• Prediction & Time Series Analysis
• Information Retrieval, Intelligent User Interface
• Intelligent Agents, Distributed IA and MultiAgents, Cooperative Knowledge-based Systems
Enhancing E-Business Process Through Data Mining
FAILURE Patterns
DATA Warehouse
DATA Warehouse
DATA Warehouse
Failure Patterns
Data Mining
(Knowledge discovery)
SUCCESS Patterns
Success Patterns
• Traditional Data Mining Tools
• Quality of discovered
knowledge
– Simple query and reporting
– Having right data
– Visualization driven data
– Having appropriate
exploration tools, OLAP
data mining tools!!!
– Discovery process is user driven
Intelligent Data Mining Tools
DATA Warehouse
FAILURE Patterns
Failure Patterns
DATA Warehouse
DATA Warehouse
SUCCESS Patterns
Success Patterns
• Automate the process of discovering
patterns/knowledge in data
• Require hypothesis, exploration
• Derive business knowledge (patterns) from data
• Combine business knowledge of users with
results of discovery algorithms
Intelligent Information Agents
• The Data Mining Problem:
– Clustering/ Classification
– Association
– Sequencing
• Viewed as an Optimization Problem
• Tools: Genetic Algorithms
Fuzzy Rules Discovering
• Rules discovering : The discovery of associations
between business events, i.e. which items are
purchased together
• In order to do flexible querying and intelligent
searching, fuzzy query is developed to uncover
potential valuable knowledge
• Fuzzy Query uses fuzzy terms like tall, small, and
near to define linguistic concepts and formulate a
query
• Automated search for fuzzy Rules is carried out by
the discovery of fuzzy clusters or segmentation in
data
Fuzzy Decision Making:Match Users with Dynamic
Products, Services, and Pricing
Example of 3 Service Provider’s Features
3
(Risk-Response-Retention ( R ) Model)
Loss Ratio
(Risk)
High
Low Risk
High Response
High Retention
->
Customer: Preferred
Low Medium
Persistency
(Retention)
Pricing: according to
Life-time Value
Cross-Selling: Bundle
Extra Liability Insurance
Low Medium
Response
High
Measuring Performance of Intelligent Agents
IAP  w1Accuracy w 2Speed  w3Cost  w 4Benefit ...
• Accuracy : distance or variance measure of IAs’
performance from their goal, i.e. Fuzzy Entropy
• Speed : latency of response
• Cost : resources consumed, consequences of failures
• Benefit : payoff for goals achieved
Performance Assessment, Learning and Optimization
FAILURE Patterns
DATA Warehouse
Failure Patterns
DATA Warehouse
DATA Warehouse
SUCCESS Patterns
Success Patterns
Learning/
Adaptation
Goals/
Objectives
Performance
Evaluation Module
Examples
• Product Information Clustering
–
–
–
–
–
Use a GA as the Heuristic Search Engine
Apply the GA selection and inversion operators
Evaluate information content
Estimate system entropy
Apply reinforcement learning strategy
• Dynamic Pricing
– In addition to above steps, explore association
and sequencing relations
The “New Technology” Paradigm
Internet
Related
Technologies
Euphoria/
Optimism
Reality
Back to
Basics
Time
INFORMATION IS SELLING
NOW!
Intelligent Agents will give your
information product bargaining power
Benefits
• Better QoS:
- Web users get information (not raw data)
- Smart agents can make decisions for users
- Smart agents can save users’ surfing time
• Faster Internet:
- Information flows on the Internet quickly
(e.g., 1k information << 100 k raw data)
- Reduce data redundancy on the Internet
- Reduce Web communication congestion
Deliverables
• Intelligent Middle Layer
- Data Mining Program Libraries
- Soft Computing Program Libraries (e.g.,
Neural Networks, Fuzzy Logic, Genetic
Algorithms, Neuro-fuzzy Systems)
• Application Layer
- Smart Web Search Agents
- Intelligent Soft Computing Agents
Conclusion
• To make the future Internet more intelligent
and more efficient, it is necessary to design
relevant "Intelligent Middleware" between
network hardware and high-level Web
application systems.
• We will first design basic intelligent middle
layer with basic intelligent functionality, and
then implement two Web application systems
for distributed data mining and E-Business.