Transcript 25 Nov 10
Business and IS Performance
(IS 6010)
MBS BIS 2010 / 2011
25th November 2010
Fergal Carton ([email protected])
Accounting Finance and Information Systems
Last week
•
Decoupling point
•
Control objectives undermined
•
Performance visibility is a design question
•
Builder quotation exercise
•
Apple case study
This week
•
•
•
•
•
Decision support
DW architecture and ETL
Data quality
Real time information
Response times and refresh rates
Decisions compare plan to actual
• Compare
– Plan
to
– Actual figure
• Decide on course of action
What is a Decision?
Current State of
the World
More Desirable
(Future) State
<>
Decision Problem or
“what to do to reduce the difference?”
Figure 1: The Decision Problem
Exploiting the DW data
Static Reporting
Data Staging
Area
Source Systems
Scrutinising
Data Warehouse
Relational Database
on a dedicated Server
Extraction
Cleaning
Transformation
Loading
Multidimensional
Data Cubes
OLAP tools
De normalised, data
Discovering
Data Mining
…….
ETL Tools
•
•
•
•
Extraction, Transformation, and Loading
Specification based
Eliminate custom coding
Third party and DBMS based tools
Data extraction and transformation
•
•
•
•
•
Getting data out of legacy applications
Cleaning up the data
Enriching it with new data
Converting it to a form suitable for upload
Staging areas
Data Quality Problems
• Multiple identifiers:
– some data sources may use different primary keys for the same
entity such as different customer numbers.
• Multiple names:
– the same field may be represented using different field names.
• Different units:
– measures and dimensions may have different units and
granularities.
• Missing values:
– data may not exist in some databases. To compensate for
missing values, different default values may be used across data
sources.
Data Quality Problems
• Orphaned transactions:
– some transactions may be missing important parts such as an
order without a customer.
• Multipurpose fields:
– some databases may combine data into one field such as
different components of an address.
• Conflicting data:
– some data sources may have conflicting data such as different
customer addresses.
• Different update times:
– some data sources may perform updates at different intervals.
Example 1 – the supplier file
New supplier code to include city where firm is based
Assignation of category based on amounts purchased
OLD
Sup code
4 digits
Sup name
Sup address
City
Sup address…
Phone
Phone
NEW
Sup code
3 letters +
4 digits
Sup name
Cat
1,2,3 depending
on total purchases
last year
Example 2: merging files
• Complete customer file based on
Accounts and Sales and Shipping
OLD (finance)
CustID name
addresscity
account numbercredit limit
balance
OLD (sales)
CustID* name
addresscity
discount rates sales_to_date rep_name
OLD (Shipping)
CustID**
name
addresscity
Preferred haulier
Life cycle of the DW
First time load
Operational Databases
Warehouse Database
Refresh
Refresh
Purge or Archive
Refresh
Real time information
•
•
•
•
•
Up to date
On-line
Actual data
Live feed
Decisions made on what basis?
Real time requirement?
• Historical sales or accounting data, not
real-time
• Sales as quarter end approaches
• Inventory levels for MRP
• Exchange rates, when is Visa rate
calculated?
• Real-time processing: card transactions
down
Real time requirement for
Apple?
Response times
• Response times are a function of :
–
–
–
–
–
–
–
–
–
response time,
Infrastructure elements,
Database sizing
Transaction processing
Interfaces
Reporting
Other processing demands
Peak times
…
Refreshing databases
•
•
•
•
•
•
Timing
Criticality of information
Volume of data
Response time
Real-time requirement
Level of aggregation / granularity
Refresh Optimization
Unknown
Processes
External
Data Sources
Primarily
Dimension
Changes
Load Time Lag
Valid Time Lag
ETL
Tools
Internal
Data Sources
Accounting
Fact and
Dimension
Changes
Data
Warehouse
Determining the Refresh
Frequency
•
•
•
•
Maximize net refresh benefit
Value of data timeliness
Cost of refresh
Satisfy data warehouse and source
system constraints