NetShark: A Caching Based Geographically Distributed

Download Report

Transcript NetShark: A Caching Based Geographically Distributed

Enterprise and Business
Intelligence Systems
(e.bis.business.utah.edu)
Research Lab, UA -> UU
Director
Olivia R. Liu Sheng, Ph.D.
Emma Eccles Jones Presidential Chair of Business
School of Accounting and Information Systems
David Eccles School of Business
University of Utah
801-585-9071, [email protected]
10/2002
1
e.bis Research Focus
Enterprise Systems
E-procurement technology
Web content caching and storage mgmt
Enterprise application integration
Process modeling and re-use
System security and risk management
Portal design and management
Business Intelligence Systems
Decision support systems
Data/web mining
Knowledge management
Knowledge refreshing
Personalization
10/2002
2
e.bis Research Output
Models
Methods
Technology
Analyses
Fueled by Applications!
10/2002
3
Faculty
Olivia R. Liu Sheng, Ph.D.
Paul Hu, Ph.D.
UU
UU
Ph.D. students and Post Docs
Xiao Fang, 5th-yr Ph.D. student
Lin Lin, 3rd-yr Ph.D. student
Wei Gao, 3rd-yr Ph.D. student
Hua Su, post-doc
Xiaoyun Sun, 1st-yr Ph.D. student
Zhongmin Ma, 1st-yr Ph.D. student
UA
UA
UA
UA
UA
UU
6 to 10 Master and UG students per yr
International and industrial collaborators
10/2002
4
Web Mining
for Knowledge Management
What is Data Mining?
The automated process of discovering relationships and patterns
in data
Related terms: knowledge discovery in database (KDD),
machine learning
A step in the knowledge discovery process consisting of
particular algorithms (methods) that under some acceptable
objective, produces a particular enumeration of patterns
(models) over the data.
An iterative process within which progress is defined by
“discovery”, through either automatic or manual methods
The application of statistical and artificial intelligence techniques
(algorithms) for discovering patterns and regularities in large
volumes of data.
10/2002
6
Why Data Mining

Type of knowledge (more abstract) and the level of
sophistication in required computation, e.g.,





Which buyers are likely to be late on future payments?
Which sellers are likely to be late on future deliveries?
If a seller increases product-in-week by x units, how much % of sales
increase can be expected.
Which buyers are similar in their buying powers and product and contract
preferences?
Frequency in discovering and applying the knowledge is met
with bottlenecks in human processing

Decision support for buyers, sellers and market hosts at each transaction
decision point
Data Visualization Needs
Going beyond business charts (e.g., pie, line, bar charts)
Maps, trees, 2-D, and 3-D
10/2002
7
Taxonomies of Data Mining
By Tasks
By Data
10/2002
8
Data Mining Tasks

Association/Sequential Patterns


Clustering


Identifying clusters embedded in the data, where a cluster is a collection of
data objects that are “similar” to one another.
Classification


The discovery of co-occurrence correlations among a set of items.
Analyzing a set of training data and constructing a model for each class
based on the features in the data.
Class Description

Providing a concise and succinct summarization of a collection of data.
Time-series Analysis
Analyzing large set of time-series data to find certain regularities and
interesting characteristics.
10/2002
9
Market Basket (Association
Rule) Analysis
A market basket is a collection of items purchased by a customer
in an individual customer transaction, which is a well-defined
business activity
Ex:
•a customer’s visit a grocery store
•an online purchase from a virtual store such as ‘Amazon.com’
10/2002
10
Market Basket (Association
Rule) Analysis
Market basket analysis is a common analysis run against
a transaction database to find sets of items, or itemsets,
that appear together in many transactions. Each pattern extracted
through the analysis consists of an itemset and the number of
transactions that contain it.
Applications:
•improve the placement of items in a store
•the layout of mail-order catalog pages
•the layout of Web pages
•others?
10/2002
11
Clustering
Clustering distributes data into several groups so that similar
objects fall into the same group. For example, we can cluster
customers based on their purchase behavior.
2, $1700
Cluster 1
3,$2000
4,$2300
10,$1800
Cluster 2
12,$2100
11,$2040
2,$100
Cluster 3
3,$200
3,$150
Applications: customer, web content, document and gene segmentation
10/2002
12
Classification
Classification classifies data into pre-defined
outcome classes
Example:
Age
23
17
43
68
32
20
10/2002
Car Type
Family
Sports
Sports
Family
Truck
Family
Risk
High
High
High
Low
Low
High
13
Classification
Age <25
Car Type in {sports}
High
Low
High
Applications: customer profiling, shopping prediction
Diagnostic decision support
10/2002
14
By Data
Structured alphanumeric data
Buyer, supplier, product, order, bank acct
Image data
Satellite, patient, document, handwriting,
facial, etc.
Spatial data
Map, traffic, geological, CAD, graphics, etc.
10/2002
15
By Data, Cont’d
Temporal data
Time series, population, stock, inventory, sales,
etc.
Spatial and temporal data – trajectory
Text – documents, web pages, etc.
Video/audio – surveillance video, voice,
music, etc.
10/2002
16
Web (Data) Mining
Web data – generated or used by the Web
Web content - static or dynamic
Web structure – hyperlinks
Web usage – web access log
10/2002
17
Why is Web Mining Important?
Rich data gathering and access medium
A variety of important applications
Information retrieval
Ecommerce – CRM, SCM, etc.
Knowledge management
Interesting challenges
Scalability – global, multi-lingual, growth
Agility of knowledge
10/2002
18
What is “knowledge”?
Relationships and patterns in data
Organized, analyzed and understandable
Truths, beliefs, perspectives, concepts,
procedures, judgments, expectations,
methodologies, heuristics, restrictions, know-how
Applicable to problem solving and decision
making
DBs, documents, policies and procedures as well
as the un-captured, tacit expertise and
experience
Actionable, at the right place and right time!!!
10/2002
19
What is Knowledge Management?
Views:
Process (KM activities)
Goal (Operational efficiency and
innovations)
Methodology (formalization, control and
technology)
Delphi Group: “Leveraging collective
wisdom to increase responsiveness and
innovation.”
10/2002
20
What is a KM program?
Processes
Organizational structure and policies
Management theories and methodologies
Information assurance
Technologies and resources
Implementation, training and change
management
Measurement, maintenance and evolution
A multi-disciplinary effort!!!
Managerial and cultural
Technological and engineering
10/2002
21
KM Process
Identify
Collect
Organize
Represent
Store
Locate
Retrieve
Extract
Discover
10/2002
Visualize
Interpret
Share
Transfer
Adapt
Apply
Monitor
Evaluate
Create
22
Data Mining & KM
Data mining  discover knowledge
Data mining  support management of
KM infrastructure
(Personalized) content management
Security management
Workflow management
Scalable performance
10/2002
23
Web Mining & KM
Web mining  discover knowledge
Web mining  support management of
web KM portal
R&D
Intranet
Consulting
B2B, B2C, e-government, e-financing, e-risk
management
10/2002
24
Web Mining
& Knowledge Refreshing
The KDD Process
Step 4:
Data Mining
Step 2:
Cleaning &
Preprocessing
Step 5:
Interpretation
& Evaluation
Discovered
Knowledge
Step 3:
Transformation
Patterns
Step 1:
Selection
Transformed
Data
Preprocessed
Data
Data
10/2002
Target Data
26
Types of Domain Knowledge
Step 4:
Data Mining
Step 2:
Cleaning &
Preprocessing
Step 5:
Interpretation
& Evaluation
Discovered
Knowledge
Step 3:
Transformation
Patterns
Step 1:
Selection
Transformed
Data
Preprocessed
Data
Data
Target Data
DBA Knowledge
Domain Expert Knowledge
Data Mining
Expert Knowledge
10/2002
27
Fundamental Problems
The size of the database is significantly large
The number of rules resulting from mining
activity is also large
The knowledge derived from a database
reflects only the current state of the database

10/2002
28
Issues in the KDD Process Agility
Scalability
Step 4:
Data Mining
Step 2:
Cleaning &
Preprocessing
Step 5:
Interpretation
& Evaluation
Discovered
Knowledge
Step 3:
Transformation
Patterns
Transformed
Data
Preprocessed
Data
Data
10/2002
Target Data
29
Knowledge Refreshing
• The process to efficiently update discovered
knowledge as data and domain knowledge
change.
• Goals
– Up-to-date knowledge (Agility)
– Knowledge Re-use (Scalability)
10/2002
30
Type of Changes
NEW
NEW
Discovered
Knowledge
NEW
NEW
Patterns
NEW
NEW
Transformed
Data
DBA Knowledge
Preprocessed
Data
Data
Domain Expert
Knowledge
Target Data
Data Mining
Expert Knowledge
NEW
10/2002
31
Knowledge Refreshing
Needs assessment
Monitoring vs. analytic approaches
Monitoring/estimate changes in knowledge to
determine if and when to re-mine
Incremental data mining (learning)
How to leverage knowledge previously
discovered from data mining to improve
computational efficiency and quality of
knowledge
10/2002
32