Decision Support Systems - San Francisco State University

Download Report

Transcript Decision Support Systems - San Francisco State University

Decision Support System
BUS 782
Decision supports Systems Components
• Data management function
– Decision Support Database
• Data warehouse
• Model management function
– Analytical models:
• Statistical model, management science model
• Data Mining
• User interface
– Data visualization
New Developments in Decision
Support Systems
• Data visualization: Representing data in
graphical/multimedia formats for analysis.
• Data Warehouse
– Data Mart: A data warehouse that is limited in scope
• Data Mining
• Geological Information System, GIS
• What-if scenarios
Data Warehouse
• Data warehouse is a repository of an
organization's electronically stored data.
• A data warehouse houses a
standardized, consistent, clean and
integrated form of data that:
– sourced from various operational systems in
use in the organization,
– structured in a way to specifically address
the reporting and analytic requirements.
Example:
Transaction Database
CID
Cname
City
OID
ODate
Rating
SalesPerson
Customer
1
Has
M
Order
M
Qty
Has
M
Product
Price
PID
Pname
Analyze Sales Data
Detailed Business Data
• Total sales:
– by product:
• Qty*Price of each detail line
• Sum (Qty*Price)
• Detailed business data: qty*price
• Total quantity sold:
– By product:
• Sum(Qty)
• Detailed business data: Qty
Dimensions for Data Analysis:
Factors relevant to the business data
• Analyze sales by Product
• Analyze sales related to Customer:
– Location: Sales by City
– Customer type: Sales by Rating
• Analyze sales related to Time:
– Quarterly, monthly, yearly Sales
• Analyze sales related to Employee:
– Sales by SalesPerson
Data Warehouse Design
- Star Schema • Dimension tables
– contain descriptions about the subjects of the
business such as customers, employees,
locations, products, time periods, etc.
• Fact table
– contain detailed business data with links to
dimension tables.
Star Schema
Location
Dimension
LocationCode
State
City
Can group by State, City
FactTable
LocationCode
PeriodCode
Rating
PID
Qty
Amount
Product
Dimension
PID
Pname
Category
CustomerRating
Dimension
Rating
Description
Period
Dimension
PeriodCode
Year
Quarter
Define Location Dimension
• Location:
– In the transaction database: City
– In the data warehouse we define Location to
be State, City
• San Francisco -> California, San Francisco
• Los Angeles -> California, Los Angeles
– Define Location Code:
• California, San Francisco -> L1
• California, Los Angeles -> L2
Define Period Dimension
• Period:
– In the transaction database: Odate
– In the data warehouse we define Period to be:
Year, Quarter
• Odate: 11/2/2003 -> 2003, 4
• Odate: 2/28/2003 -> 2003, 1
– Define Period Code:
• 2003, 4 -> 20034
• 2003, 1 -> 20031
The ETL Process
L
T
One,
companywide
warehouse
E
Periodic extraction  data is not completely current in warehouse
The ETL Process
• Capture/Extract
• Transform
– Scrub(data cleansing),derive
– Example:
• City -> LocationCode, State, City
• OrderDate -> PeriodCode, Year, Quarter
• Load and Index
ETL = Extract, transform, and load
From SalesDB to
MyDataWarehouse
• Extract data from SalesDB:
– Create query to get the fact data
• FactData
– Download to MyDataWareHouse
• Transform:
– Transform City to Location
– Transform Odate to Period
• Query FactDataScrubing
• Load data to FactTable
Performing Analysis
• Analyze sales:
– by Location
– By Location and Customer Type
– By Location and Period
– By Period and Product
• Pivot Table:
– Drill down, roll up, reaggregation
Star schema example
Fact table provides statistics for sales
broken down by product, period and
store dimensions
Dimension tables contain descriptions about the subjects of the business
Star schema with sample data
Snowflake Schema
Location
Dimension
LocationCode
State
City
Can group by State, City
Product
Category
CategoryID
Description
FactTable
LocationCode
PeriodCode
Rating
PID
Qty
Amount
Product
Dimension
PID
Pname
CategoryID
CustomerRating
Dimension
Rating
Description
Period
Dimension
PeriodCode
Year
Quarter
Data Mining
• Knowledge discovery using a blend of
statistical, Artificial Intelligence, and
computer graphics techniques
• Goals:
– Explain observed events or conditions
– Explore data for new or unexpected relationships
History in the Development of Data Mining.
Evolutionary Step Business Question
Enabling
Technologies
Characteristics
Data
"What was my total Computers,
Collection(196 revenue in the last tapes, disks
0s)
five years?"
Retrospective, static
data delivery
Data
"What were unit
Access(1980s sales in New
)
England last
March?"
Retrospective, dynamic
data delivery at record
level
Data
Warehousing
&Decision
Support
(1990s)
"What were unit
sales in New
England last
March? Drill down
to Boston."
Data
"What’s likely to
Mining(Emergi happen to Boston
ng Today)
unit sales next
month? Why?"
Relational
databases
(RDBMS),
Structured Query
Language (SQL),
ODBC
On-line analytic
processing
(OLAP),
multidimensional
databases, data
warehouses
Advanced
algorithms,
multiprocessor
computers,
massive
databases
Retrospective, dynamic
data delivery at multiple
levels
Prospective, proactive
information delivery
Typical Data Mining Techniques
•
•
•
•
Statistical regression
Decision tree induction
Clustering – discover subgroups
Affinity – discover things with strong mutual
relationships
• Sequence association – discover cycles of
evens and behaviors
• Rule discovery – search for patterns and
correlations
Typical Data Mining Applications
• Profiling populations
– High-value customers, credit risks, credit card fraud
•
•
•
•
Analysis of business trends
Target marketing
Campaign effectiveness
Product affinity
– Identifying products that are purchased concurrently
• Up-selling
– Identifying new products and services to sell to a
customer based on critical events
Affinity Analysis:
Market Basket Analysis
• Market Basket Analysis is a modeling technique
based upon the theory that if you buy a certain
group of items, you are more (or less) likely to
buy another group of items.
• The set of items a customer buys is referred to
as an itemset, and market basket analysis
seeks to find relationships between purchases.
• Typically the relationship will be in the form of a
rule: Example:
– IF {beer, no bar meal} THEN {chips}.
Basket Analysis and Cross- Selling
• For instance, customers are very likely to
purchase shampoo and conditioner together, so
a retailer would not put both items on promotion
at the same time. The promotion of one would
likely drive sales of the other.
• A widely used example of cross selling on the
internet with market basket analysis is
Amazon.com's use of suggestions of the type:
– "Customers who bought book A also bought book B",
e.g.
Geological Information System
GIS
• GIS is a computer-based tool for mapping and
analyzing things that exist and events that
happen on earth. GIS technology integrates
common database operations such as query
and statistical analysis with the unique
visualization and geographic analysis benefits
offered by maps.
Data of GIS
• Geodatabase:
– A geodatabase is a database that is in some way
referenced to locations on the earth.
• Longitude, latitude
• Attribute data:
– Attribute data generally defined as additional
information, which can then be tied to spatial data.
• Example:
– Google Earth
– GeoCode service:
http://developer.yahoo.com/maps/rest/V1/geocode.htm
l
Scenario
• A scenario is an assumption about input
variables.
• Excel’s Scenarios is a what-if-analysis tool. A
scenario is a set of values that Microsoft Excel
saves and can substitute automatically in your
worksheet.
• You can use scenarios to forecast the outcome
of a worksheet model. You can create and save
different groups of values on a worksheet and
then switch to any of these new scenarios to
view different results.
• Data/What If analysis/Scenario
Creating a Scenario
– Add scenario
• Changing cells
• Resulting cells
• Demo: benefit.xls