9 Data-Mining Systems

Download Report

Transcript 9 Data-Mining Systems

Management Information Systems
Chapter 9
Competitive Advantage with
Information Systems for Decision
Making
This Could Happen to You
How can information systems improve decision making?
Business processes and decision making are closely allied
IS facilitate competitive strategy by adding value to or reducing
costs of processes
IS adds value or reduces costs by improving quality of decisions
Can an information system assist in the selection of a
vendor based on past performance?
2
Study Questions
Q1. How big is an exabyte, and why does it matter?
Q2. How do business intelligence systems provide competitive
advantages?
Q3. What problems do operational data pose for BI systems?
Q4. What are the purpose and components of a data
warehouse?
Q5. What is a data mart, and how does it differ from a data
warehouse?
Q6. What are the characteristics of data-mining systems?
3
Q1. How Big Is an Exabyte?
Figure 9-1
4
Why Does It Matter?
Storage capacity is increasing as cost decreases
Nearly unlimited
Over 2.5 exabytes of data have been created
Exponential growth both inside and outside of
organizations
Can be used to improve decision making
5
硬碟儲存空間
6
Q2. Business Intelligence (BI) Systems
Provide information for improving decision making
Primary systems:
Reporting systems
Data-mining systems
Knowledge management systems
Expert systems
7
Reporting Systems
Integrate data from multiple sources
Process data by sorting, grouping, summing,
averaging, and comparing
Results formatted into reports
Improve decision making by providing right
information to right user at right time
8
Data-Mining Systems
Process data using statistical techniques
Regression analysis
Decision tree analysis
Look for patterns and relationships to anticipate
events or predict outcomes
Market-basket analysis
Predict donations
9
Knowledge-Management Systems
Create value from intellectual capital
Collects and shares human knowledge
Supported by the five components of the
information system
Fosters innovation
Increases organizational responsiveness
10
Expert Systems
Encapsulate experts’ knowledge
Produce If/Then rules
Improve diagnosis and decision making in nonexperts
11
Q3. Problems with Operational Data
Raw data usually unsuitable for sophisticated reporting or
data mining
Dirty data
Values may be missing
Inconsistent data
Data can be too fine or too coarse
Too much data
Curse of dimensionality
Too many rows
12
對BI系統而言,使用作業系統會有的問題
13
Guide: Counting and Counting and
Counting
Product managers wanted data miners to analyze
customer clicks on Web page
Determine preferences for product lines
Data miners wanted to sample; product managers
wanted all data
Would take days to calculate
Sampling is acceptable
Must be appropriate
Saves time and money
14
Q4. Data Warehouse
Used to extract and clean data from operational systems
Prepares data for BI processing
Data-warehouse DBMS
Stores data
May also include data from external sources
Metadata concerning data stored in data-warehouse meta database
Extracts and provides data to BI tools
15
資料倉儲的元件資料
16
從資料商可購買到的顧客資料
17
Q5. Data Mart
Data collection
Created to address particular needs
 Business function
 Problem
 Opportunity
Smaller than data warehouse
Users may not have data management expertise
 Knowledgeable analysts for specific function
18
資料市集範例
19
Q6. Data Mining
Application of statistical techniques to find patterns
and relationships among data
Knowledge discovery in databases (KDD)
Take advantage of developments in data
management
Two categories:
Unsupervised
Supervised
20
資料探勘結合許多領域
21
Unsupervised Data Mining
Analysts do not create model before running
analysis
Apply data-mining technique and observe results
Hypotheses created after analysis as explanation
for results
Example: cluster analysis
22
Supervised Data Mining
Model developed before analysis
Statistical techniques used to estimate parameters
Examples:
Regression analysis
Neural networks
23
Ethics Guide: Data Mining Real World
Data mining is different from the way it is shown in
textbooks
Data is dirty
Values are missing or outside of ranges
Time value make no sense
You add parameters as you gain knowledge, forcing reprocessing
Overfitting
Based on probabilities, not certainty
Seasonality problem
24
Using This Knowledge to Close the Gap
Reporting system could process supplier information to
rank quality
Data-mining system could search for patterns to predict
delivery delays or quality problems
Knowledge management system could rank suppliers or
share experiences
Expert system could contain rules for supplier selection
Data mart could maintain information on inbound logistics
and manufacturing
25