new_DecisSupp - Department of Computer Science and

Download Report

Transcript new_DecisSupp - Department of Computer Science and

Decision support
systems for Ecommerce
Working Definition of DSS
A DSS is an integrated, interactive computer system,
consisting of analytical tools and information
management capabilities, designed to aid decision
makers in solving relatively large, unstructured problems
Decision Making samples


what were the sales volumes by region and product category
for the last year?
How did the share price of computer manufacturers correlate
with quarterly profits over the past 10 years?
Central Issue in DSS
support and improvement of decision making
Management Decision Making
Strategic


CEO, board of directors, top executives
Develop overall strategies of organization
Tactical


Regional managers, plant managers, division
supervisors
Carry out strategic managers plans
Operational


Direct managers, team leaders
Carry out tactical managers plans
Different Technologies are invented to
meet different Decision Making Goals!
The Big Picture: DBs, Data Warehouse,
& OLAP, Data Mining
OLAP Server
other
sources
Operational
DBs
Extract
Transform
Load
Refresh
Data
Warehouse
Data Storage
Serve
Analysis
Query
Reports
Data mining
OLAP Engine Front-End Tools
Evolutionary Step
Technologies
Providers
Data Collection
(1960s)
Computers, tapes,
disks
IBM, CDC
Data Access
(1980s)
Relational
databases, SQL,
ODBC
Oracle, Sybase,
Informix, IBM,
Microsoft
Data Warehousing
& Decision Support
systems
(1990s)
On-line analytic
Cognos, Arbor,
Processing (OLAP), Pilot, Microstrategy,
Multidimensional
ORACLE, IBM
databases (Cubes)
Data Mining
(Present)
Statistics, Machine
Learning, AI
SAS, SPSS, IBM,
ORACLE, Cognos,
Microsoft
Why Build a Data Warehouse?

Separate transactional and analysis systems :
to make Tactical or even Strategic decisions
for Regional managers or CEOs




Easy formulation of complex queries
Access to historical data (not in operational
systems)
Improved data quality (fewer errors and missing
values)
Access to data from multiple sources, have a
comprehensive data collection
Potential Applications of Data
Warehousing and Mining in EC
Analysis of user access patterns and buying
patterns
Customer segmentation and target marketing
Cross selling and improved Web advertisement
Personalization
Association (link) analysis
Customer classification and prediction
Time-series analysis
Typical event sequence and user behavior pattern
analysis
Transition and trend analysis
Data Warehousing
The phrase data warehouse was coined by
William Inmon in 1990
Data Warehouse is a decision support
database that is maintained separately from
the organization’s operational database
Definition: A DW is a repository of integrated
information from distributed, autonomous, and
possibly heterogeneous information sources
for query, analysis, decision support, and data
mining purposes
Characteristics (cont’d)
Integrated


No consistency in encoding, naming conventions,
… among different application-oriented data from
different legacy systems, different heterogeneous
data sources
When data is moved to the warehouse, it is
consolidated converted, and encoded
Characteristics (cont’d)
Non-volatile



New data is always appended to the
database, rather than replaced
The database continually absorbs new data,
integrating it with the previous data
In contrast, operational data is regularly
accessed and manipulated a record at a time
and update is done to data in the operational
environment
Characteristics (cont’d)
Time-variant




Operational database contain current value data.
Operational data is valid only at the moment of
access-capturing a moment in time.
The time horizon for the data warehouse is
significantly longer than that of operational systems.
Data warehouse data is nothing more than a
sophisticated series of snapshots, taken as of some
moment in time.

System Architecture
End User
Analysis, Query Reports,
Data Mining
Detector
Detector
Detector
Legacy
Flat-file
...
Detector
RDBMS
OODBMS
Data Warehouse Back-End Tools and Utilities
Data extraction:
 Extract data from multiple, heterogeneous, and external
sources
Data cleaning (scrubbing):
 Detect errors in the data and rectify them when possible
Data converting:
 Convert data from legacy or host format to warehouse
format
Transforming:
 Sort, summarize, compute views, check integrity, and
build indices
Refresh:
 Propagate the updates from the data sources to the
warehouse
On-Line Analytical Processing (OLAP)
Front-end to the data warehouse. Allowing
easy data manipulation
Allows conducting inquiries over the data at
various levels of abstractions
Fast and easy because some aggregations
are computed in advance
No need to formulate entire query
OLAP: Data Cube
OLAP uses data in multidimensional format (e.g., data cubes)
to facilitate query and response time.
2Qtr
3Qtr
4Qtr
sum
U.S.A
Canada
Mexico
sum
Country
TV
PC
VCR
sum
1Qtr
Date
Overall sales of
TV’s in the US
in 3rd quarter
OLAP: Data Cube Operations
Slicing:
Selecting the dimensions of the cube to be viewed.

Example: View “Sales volume” as a function of “Product ” by
“Country “by “Quarter”
Dicing:
Specifying the values along one or more
dimensions.

Example: View “Sales volume” for “Product=PC” by
“Country “by “Quarter”
OLAP: Data Cube Operations
Drilling down: from higher level
aggregation to lower level aggregation or
detailed data (Viewing by “state” after
viewing by “region” )
Rolling-up: Summarize data by climbing
up hierarchy or by dimension reduction
(E.g., viewing by “region” instead of by
“state”)
Cube Operations Illustrated
Drilling down
Rolling up
Actual Application
Com.1
 Query:
 “overall & detail production performance”
•
•
•
•
manufacturer: Com1
products: all products
date interval: 01-Jan-94 until 01-Jan-1999
source: USDA
Com.1
Lot#1
Contract Number 1
Com.1
Lot#2
Contract Number 2
Com.1
Lot#3
Contract Number 3
Data Mining
“Data Mining is the exploration and analysis
by automatic or semi-automatic means,
of large or small quantities of data
in order to discover meaningful patterns,
trends and rules.”
Data Mining
Data Analysis
Statistics
AI & ML
Database
Data Warehouse
OLAP
Data Analysis
Classification
Regression
Clustering
Association
Sequence Analysis
Data Analysis (cont.)
Modeling
X1
Numeric
Y1
Numeric
Regression
3, 4.5, 102, …
Categorical
f
X2
hot, cold, high, low, …
Crisp
X3
Y2
Categorical
Classification
Y3
Crisp
0, 1, yes, no, …
Input Variables
or
Independent Variables
or
Attributes or Descriptors
Linear Models
or
Non-linear Models
or
A set of rules
Output Variables
or
Dependent Variables
or
Classes or Targets
Data Analysis (cont.)
Clustering
Association
Income
1, chips, coke, chocolate
2, gum, chips
3, chips, coke
4, …
Age
Probability (chips, coke) ?
Probability (chips, gum) ?
Sequence Analysis
…ATCTTTAAGGGACTAAAATGCCATAAAAATCCATGGGAGAGACCCAAAAAA…
Xt-1
T
Xt
Data Analysis (cont.)
 Classification
 Linear Discriminant Analysis
 Naïve Bayes / Bayesian Network
 OneR
 Neural Networks
 Decision Tree (ID3, C4.5, …)
 K-Nearest Neighbors (IB)
 Support Vector Machines (SVM)
…
 Clustering
 K-Mean Clustering
 Self Organizing Map
 Bayesian Clustering
 COBWEB
…
 Regression
 Multiple Linear Regression
 Principal Components Regression
 Partial Least Square
 Neural Networks
 Regression Tree (CART, MARS, …)
 K-Nearest Neighbors (LWR)
 Support Vector Machines (SVR)
…
 Association & Sequence Analysis
 A Priori
 Markov Chain
 Hidden Markov Models
…
Challenges
 Faster, more accurate and more scalable
techniques
 Incremental, on-line and real-time
learning algorithms
 Parallel and distributed data processing
techniques
Opportunities
 Data mining is a ‘top ten’ emerging technology
 Data mining is finding increasing acceptance in
science and business areas which need to analyze
large amounts of data to discover trends and
patterns which they could not otherwise find.
 Data mining is an exciting and challenging field with
the ability to solve many complex scientific and
business problems.