Transcript 01 Lecture
BIS4435 – Data Warehousing
Lecture 9
Dr. Nawaz Khan
E-mail: [email protected]
1
Data Warehousing: Reading Assignment
Reading Suggestion:
Lecture 9
Connolly, T.M., and Begg, C.E., Database Systems: A Practical
Approach to Design, Implementation and Management, Addison
Wesley,
4th Edition, ISBN: 0321210255(chapters 31-33)
Global campus materials on OASIS: http://oasis.mdx.ac.uk/
(unit 9)
More Reading:
Fundamentals of Database Systems. R. Elmasri and S. B.
Navathe, 4th Edition, 2004, Addison-Wesley, ISBN 0-321-12226-7:
Chapter 28
Data Warehousing, Data Mining, and OLAP, Alex Berson and
Stephen J. Smith, McGraw-Hill, 1997, ISBN 0-07-006272-2:
Chapters 6, 7
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
2
Data Warehousing
Outline
Lecture 9
Definition
Compare with operational systems
Architecture
Design issues - star schema
Relation with DM
Summary
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
3
Data Warehousing
Definition
What is a data warehouse?
DW is an environment + facilities
Lecture 9
Bring scattered data
plant1
plant2 …... planti
query data
…...
plantn
Finished product
warehouse
query/
delivery
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
4
Data Warehousing
Definition
What is a data warehouse?
Financial
Department
Human Resource
Department
…...
R&D
Department
Lecture 9
Operational
systems
Data transformation
DW
Access tool
users
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
5
Data Warehousing
Compare with operational systems
Operational DB systems
Lecture 9
focus on day-to-day business - data structured around
events
run in OLTP environment
support large number of transactions
require quick respond - small, focused DB
DW systems
focus on business needs and requirements - data organised
around trends and patterns in events
run in off-line environment
support complex queries, ad hoc and static reports - based
on historical data
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
6
Data Warehousing
Compare with operational systems
system
feature
Lecture 9
Size
Content
Operational
DW
Small
Large - history of business
Small work areas
Performance Speed - essential
Tools
Cross-functional subjects
Better information
Restrict
Various flexible
standard reporting tools
transform/present data as intelligence
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
7
Data Warehousing
Compare with operational systems
Separation of an Operational and DW System
Lecture 9
Minimises impact of reporting and complex query processing
on operational systems
Preserves operational data for re-use
Manages data based on time, historical data available to
users
Provides a data store that can be modified to conform to the
way the user views the data
Unifies data, one version
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
8
Data Warehousing
Compare with operational systems
The need for DW
Lecture 9
Consistent and quality data
Cost reduction
More timely data access
Improved performance and productivity
Two distinct types of reporting still require
Operational systems derive notification style reports
DW systems generate general information reports
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
9
Data Warehousing
Architecture
Overall Architecture
Lecture 9
Operational data and processing is separated from data
warehouse processing
DW is a central information repository surrounded by a
number of components - environment
Data
Mart
Data
Transformation
Data
Warehouse
Metadata
Access
Tools
Information
Delivery
System
Operational Data
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
10
Data Warehousing
Architecture
The DW
DW database is a cornerstone of the environment
It is implemented on RDBMS technology
It should support large size, ad hoc query, user view
Lecture 9
Data Transformation
Significant effort on extracting data from operational system and
putting it in a suitable format into DW system
Functionality
Removing unwanted data from operational databases
Converting to common data names and definitions
Calculating summaries and derived data
Establishing defaults for missing data
Accommodating source data definition changes
Difficulties
Database heterogeneity
Data heterogeneity
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
11
12 Rules for Data Warehouse
Lecture 9
Data warehouse and operational environments are
separated
Data are integrated
Contains historical data
Represent snapshot data at a given point in time
Data are subject oriented
Data are read-only
Data warehouse life cycle is data driven
Contain summarised data
Read-only transactions involve
Involves data transformation
Meta data component is very critical
Ensure optimum use of data by end users.
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
12
Data Warehousing
Architecture
Meta Data - data that describes the data warehouse
Lecture 9
Description of the data model
Description of the database design
Definition of the system managing the data items
A map of the data location in the DW, including its origin,
how it is transformed/aggregated, where it went
Specific database design definitions
Data element definitions, including rules for derivations and
summaries
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
13
Data Warehousing
Architecture
Meta Data (cont.)
Lecture 9
Information Directory - metadata that helps users to
interactively access to DW and understand content, find
data
A gateway to the DW environment
Support easy distribution and replication of its content
Searchable by business-oriented key words
Act as launch platform for user data access and analysis
tools
Support information sharing
Support a variety of scheduling options
Support distribution of query results
Provide interface to other applications
Support end-user monitoring of the status of the DW
environment
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
14
Data Warehousing
Architecture
Access Tools
Query and reporting tools
Lecture 9
Managed query tool
Reporting tool
Production reporting tool
Desktop report writer
Production reporting tool - Generate regular operational reports or
support high volume batch jobs
Report writer - designed for end users
Managed query tool - a meta-layer between end-user and
database, provides point-click creation of SQL, formats the query
results into easy-to-read reports, or on-screen presentation
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
15
Data Warehousing
Architecture
Access Tools (cont.)
Lecture 9
Application development tools - graphical data access
environment
EIS tools - high level summarisation
OLAP - multidimensional DB
Data mining tools
Data visualisation tools - display complex relationships and
patterns, techniques include 3-D imaging & sound, virtual
reality
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
16
Data Warehousing
Architecture
Data Marts
Lecture 9
Data store that is subsidiary to a data warehouse of
integrated data
It is created for the use of a dedicated group of users for a
subject area
It can be placed on the DW database
In most instances, data mart is separated from the DW
database and put on a separate database server
Dependent data mart - data content is from the DW
Independent data mart - alternative to the DW
simple & inexpensive to build
inconsistent - each has its own assumptions
overlapping in data content, connectivity and management
scalability problem
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
17
Data Warehousing
Architecture
Data Marts (cont.)
Data integration issue - Ralph Kimball
For any two data marts, common dimensions must conform
to the equality or roll-up rule
Lecture 9
Time
Period Sales
Products
month
week
day
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
18
Data Warehousing
Architecture
Information Delivery System
Lecture 9
It distributes data from warehouse to other DW and end-user
products such as spreadsheets and local DBs (via Internet)
Delivery is based on time or event
Users receive report or an analytical view of data are not
aware of location and maintenance
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
19
Data Warehousing
Design issues - star schema
Architecture Blueprint - mission, goals, objectives
Logical
Architecture
Lecture 9
Data Architecture
(data)
Technology Architecture
(hardware, software & network)
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
Application Architecture
(tools)
20
Data Warehousing
Design issues - star schema
Star Schema
Time Dimension
Lecture 9
Product
Dimension
Other
Dimension
sales
revenues
Location
Dimension
Age Group
Dimension
It is used to model the data in a DW from decision-makers
view of the business and operational aspects of the business
It defines the join paths for accessing the facts of business
It allows user to filter, aggregate, drill down & slice and dice
the business fact
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
21
Data Warehousing
Design issues - star schema
Star Schema - 3 logical entities
Measure entities – centre
Lecture 9
Dimension entities – point
Category (detail) entities - extended from point
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
22
Data Warehousing
Design issues - star schema
Measure entities – centre
Lecture 9
Focus of the users’ query
activity
Factual information ->
business intelligence
Synonymous names are
used - measures, analysis,
indicators
Quantitative data numerical information
Data contained in measure
entities grows large over
time
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
Month
199901
199901
199901
…
Branch
ABC
XYZ
PQR
Product
COLA
COLA
COLA
Sales forecast
200000
150000
125000
Sales actual
1900000
1550000
1050000
Variance
-10000
50000
-20000
…
23
Data Warehousing
Design issues - star schema
Dimension entities – point
Lecture 9
Allow users to browse measurement data from different
angles - time, location, product ...
Minimize the rows of data within a measure entity - filter
All Location
Canada
France
Germany
Location dimension
Country
USA
Eastern Area
Western Area
North-east Central South-east
Region
Region Region
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
Area
Region
24
Data Warehousing
Design issues - star schema
Category (detail) entities - extended from point
Lecture 9
Provide detailed information of a category within a dimension
Textual/qualitative information
CLIENT DETAILS
All Clients
(Dimension)
Region
Client
State
….
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
CLIENT_KEY
COMPANY_N
AME
ADDRESS
POST_CODE
CONTRY
NAME
PHONE
25
Data Warehousing
Design issues - star schema
Forming Star Schema: Star schema can be formed
based on an information package, which is constructed
during data gathering process
Lecture 9
Category
Dimension
All Time
Periods
Year
5
Quarter
20
Month
60
All
All Products
All Age
All Econ.
All
Locations
Groups
Classes
Genders
Country Classification Age Group
Class
Gender
20
8
8
10
3
Area
Group
80
40
Region
Product
400
200
District
2,000
Store
200,000
Measures/Facts:
Forecast Sales, Budget Sales, Actual Sales, Forecast Variance (calc.), Budget Variance (calc.)
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
26
Data Warehousing
Design issues - star schema
Forming Star Schema: Define measure entity
Lecture 9
lowest category within each dimension along with each of
the measures/facts defines measure entity
give a name to reflect the business purpose
put in the centre of the star schema in a rectangle box
Sales
Analysis
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
27
Data Warehousing
Design issues - star schema
Forming Star Schema: Define dimension entity
Lecture 9
each column of an information package defines the
dimension entity
place on the periphery of the star in a diamond shaped box
consider relationship to the measure entity “measures based
on dimension”
Time
Location
Product
Sales
Analysis
E-class
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
Age
Gender
28
Data Warehousing
Design issues - star schema
Forming Star Schema: Define category entity
Lecture 9
examine each individual cell of an information package to
determine if it qualifies as a category detail entity
category entities become extensions of dimension entities
add to star schema in stop sign box
Dimension
All Time
Periods
Year
5
Quarter
60
Month
60
Category
All
Locations
Country
20
Area
80
Region
400
District
2,000
Store
200,000
All Products
Classification
8
Group
40
Product
200
All Age
Groups
Age Group
8
All Econ.
Classes
Class
10
All
Genders
Gender
3
Customer
Product
Store
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
BIS4227 - Online Database Systems
29
Data Warehousing
Design issues - star schema
Forming Star Schema:
Tim
e
Lecture 9
Locatio
n
Product
Details
Produc
t
Sales
Analysis
Gende
r
Econ.
Class
Store
details
Ag
e
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
Customer
details
30
Data Warehouse Example: Operational Data
Lecture 9
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
31
Data Warehouse Example: Star Schema
Lecture 9
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
32
Data Warehouse Example: Summary Report
Lecture 9
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
33
Data Warehousing
Relation with DM
Lecture 9
They have the same purpose - decision support
DW assembles, formats, and organises historical data to
answer user query as it is - depends on content of DW
DW will not attempt to extract further information neither
will it predict trends and patterns from data
DM will extract previously unknown and useful information
as well as predict trends and patterns
DM can be performed on DW and/or traditional DB
DM: next lecture
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
34