Transcript 01 Lecture

BIS4435 – Data Warehousing
Lecture 9
Dr. Nawaz Khan
E-mail: [email protected]
1
Data Warehousing: Reading Assignment
 Reading Suggestion:
Lecture 9
Connolly, T.M., and Begg, C.E., Database Systems: A Practical
Approach to Design, Implementation and Management, Addison
Wesley,
4th Edition, ISBN: 0321210255(chapters 31-33)

Global campus materials on OASIS: http://oasis.mdx.ac.uk/
(unit 9)
 More Reading:


Fundamentals of Database Systems. R. Elmasri and S. B.
Navathe, 4th Edition, 2004, Addison-Wesley, ISBN 0-321-12226-7:
Chapter 28
Data Warehousing, Data Mining, and OLAP, Alex Berson and
Stephen J. Smith, McGraw-Hill, 1997, ISBN 0-07-006272-2:
Chapters 6, 7
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
2
Data Warehousing
 Outline
Lecture 9






Definition
Compare with operational systems
Architecture
Design issues - star schema
Relation with DM
Summary
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
3
Data Warehousing
Definition
 What is a data warehouse?
DW is an environment + facilities
Lecture 9
Bring scattered data
plant1
plant2 …... planti
query data
…...
plantn
Finished product
warehouse
query/
delivery
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
4
Data Warehousing
Definition
 What is a data warehouse?
Financial
Department
Human Resource
Department
…...
R&D
Department
Lecture 9
Operational
systems
Data transformation
DW
Access tool
users
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
5
Data Warehousing
Compare with operational systems
 Operational DB systems
Lecture 9
 focus on day-to-day business - data structured around
events
 run in OLTP environment
 support large number of transactions
 require quick respond - small, focused DB
 DW systems
 focus on business needs and requirements - data organised
around trends and patterns in events
 run in off-line environment
 support complex queries, ad hoc and static reports - based
on historical data
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
6
Data Warehousing
Compare with operational systems
system
feature
Lecture 9
Size
Content
Operational
DW
Small
Large - history of business
Small work areas
Performance Speed - essential
Tools
Cross-functional subjects
Better information
Restrict
Various flexible
standard reporting tools
transform/present data as intelligence
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
7
Data Warehousing
Compare with operational systems
 Separation of an Operational and DW System
Lecture 9
 Minimises impact of reporting and complex query processing
on operational systems
 Preserves operational data for re-use
 Manages data based on time, historical data available to
users
 Provides a data store that can be modified to conform to the
way the user views the data
 Unifies data, one version
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
8
Data Warehousing
Compare with operational systems
 The need for DW
Lecture 9





Consistent and quality data
Cost reduction
More timely data access
Improved performance and productivity
Two distinct types of reporting still require
 Operational systems derive notification style reports
 DW systems generate general information reports
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
9
Data Warehousing
Architecture
 Overall Architecture
Lecture 9
 Operational data and processing is separated from data
warehouse processing
 DW is a central information repository surrounded by a
number of components - environment
Data
Mart
Data
Transformation
Data
Warehouse
Metadata
Access
Tools
Information
Delivery
System
Operational Data
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
10
Data Warehousing
Architecture
 The DW
 DW database is a cornerstone of the environment
 It is implemented on RDBMS technology
 It should support large size, ad hoc query, user view
Lecture 9
 Data Transformation
 Significant effort on extracting data from operational system and
putting it in a suitable format into DW system
 Functionality





Removing unwanted data from operational databases
Converting to common data names and definitions
Calculating summaries and derived data
Establishing defaults for missing data
Accommodating source data definition changes
 Difficulties
 Database heterogeneity
 Data heterogeneity
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
11
12 Rules for Data Warehouse
Lecture 9
 Data warehouse and operational environments are
separated
 Data are integrated
 Contains historical data
 Represent snapshot data at a given point in time
 Data are subject oriented
 Data are read-only
 Data warehouse life cycle is data driven
 Contain summarised data
 Read-only transactions involve
 Involves data transformation
 Meta data component is very critical
 Ensure optimum use of data by end users.
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
12
Data Warehousing
Architecture
 Meta Data - data that describes the data warehouse
Lecture 9




Description of the data model
Description of the database design
Definition of the system managing the data items
A map of the data location in the DW, including its origin,
how it is transformed/aggregated, where it went
 Specific database design definitions
 Data element definitions, including rules for derivations and
summaries
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
13
Data Warehousing
Architecture
 Meta Data (cont.)
Lecture 9
 Information Directory - metadata that helps users to
interactively access to DW and understand content, find
data
 A gateway to the DW environment
 Support easy distribution and replication of its content
 Searchable by business-oriented key words
 Act as launch platform for user data access and analysis
tools
 Support information sharing
 Support a variety of scheduling options
 Support distribution of query results
 Provide interface to other applications
 Support end-user monitoring of the status of the DW
environment
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
14
Data Warehousing
Architecture
 Access Tools
Query and reporting tools
Lecture 9
Managed query tool
Reporting tool
Production reporting tool
Desktop report writer
 Production reporting tool - Generate regular operational reports or
support high volume batch jobs
 Report writer - designed for end users
 Managed query tool - a meta-layer between end-user and
database, provides point-click creation of SQL, formats the query
results into easy-to-read reports, or on-screen presentation
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
15
Data Warehousing
Architecture
 Access Tools (cont.)
Lecture 9
 Application development tools - graphical data access
environment
 EIS tools - high level summarisation
 OLAP - multidimensional DB
 Data mining tools
 Data visualisation tools - display complex relationships and
patterns, techniques include 3-D imaging & sound, virtual
reality
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
16
Data Warehousing
Architecture
 Data Marts
Lecture 9
 Data store that is subsidiary to a data warehouse of
integrated data
 It is created for the use of a dedicated group of users for a
subject area
 It can be placed on the DW database
 In most instances, data mart is separated from the DW
database and put on a separate database server
 Dependent data mart - data content is from the DW
 Independent data mart - alternative to the DW
 simple & inexpensive to build
 inconsistent - each has its own assumptions
 overlapping in data content, connectivity and management
 scalability problem
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
17
Data Warehousing
Architecture
 Data Marts (cont.)
 Data integration issue - Ralph Kimball
 For any two data marts, common dimensions must conform
to the equality or roll-up rule
Lecture 9
Time
Period Sales
Products
month
week
day
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
18
Data Warehousing
Architecture
 Information Delivery System
Lecture 9
 It distributes data from warehouse to other DW and end-user
products such as spreadsheets and local DBs (via Internet)
 Delivery is based on time or event
 Users receive report or an analytical view of data are not
aware of location and maintenance
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
19
Data Warehousing
Design issues - star schema
Architecture Blueprint - mission, goals, objectives
Logical
Architecture
Lecture 9
Data Architecture
(data)
Technology Architecture
(hardware, software & network)
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
Application Architecture
(tools)
20
Data Warehousing
Design issues - star schema
 Star Schema
Time Dimension
Lecture 9
Product
Dimension
Other
Dimension
sales
revenues
Location
Dimension
Age Group
Dimension
 It is used to model the data in a DW from decision-makers
view of the business and operational aspects of the business
 It defines the join paths for accessing the facts of business
 It allows user to filter, aggregate, drill down & slice and dice
the business fact
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
21
Data Warehousing
Design issues - star schema
 Star Schema - 3 logical entities
 Measure entities – centre
Lecture 9
 Dimension entities – point
 Category (detail) entities - extended from point
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
22
Data Warehousing
Design issues - star schema
 Measure entities – centre
Lecture 9
 Focus of the users’ query
activity
 Factual information ->
business intelligence
 Synonymous names are
used - measures, analysis,
indicators
 Quantitative data numerical information
 Data contained in measure
entities grows large over
time
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
Month
199901
199901
199901
…
Branch
ABC
XYZ
PQR
Product
COLA
COLA
COLA
Sales forecast
200000
150000
125000
Sales actual
1900000
1550000
1050000
Variance
-10000
50000
-20000
…
23
Data Warehousing
Design issues - star schema
 Dimension entities – point
Lecture 9
 Allow users to browse measurement data from different
angles - time, location, product ...
 Minimize the rows of data within a measure entity - filter
All Location
Canada
France
Germany
Location dimension
Country
USA
Eastern Area
Western Area
North-east Central South-east
Region
Region Region
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
Area
Region
24
Data Warehousing
Design issues - star schema
 Category (detail) entities - extended from point
Lecture 9
 Provide detailed information of a category within a dimension
 Textual/qualitative information
CLIENT DETAILS
All Clients
(Dimension)
Region
Client
State
….
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
CLIENT_KEY
COMPANY_N
AME
ADDRESS
POST_CODE
CONTRY
NAME
PHONE
25
Data Warehousing
Design issues - star schema
 Forming Star Schema: Star schema can be formed
based on an information package, which is constructed
during data gathering process
Lecture 9
Category
Dimension
All Time
Periods
Year
5
Quarter
20
Month
60
All
All Products
All Age
All Econ.
All
Locations
Groups
Classes
Genders
Country Classification Age Group
Class
Gender
20
8
8
10
3
Area
Group
80
40
Region
Product
400
200
District
2,000
Store
200,000
Measures/Facts:
Forecast Sales, Budget Sales, Actual Sales, Forecast Variance (calc.), Budget Variance (calc.)
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
26
Data Warehousing
Design issues - star schema
 Forming Star Schema: Define measure entity
Lecture 9
 lowest category within each dimension along with each of
the measures/facts defines measure entity
 give a name to reflect the business purpose
 put in the centre of the star schema in a rectangle box
Sales
Analysis
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
27
Data Warehousing
Design issues - star schema
 Forming Star Schema: Define dimension entity
Lecture 9
 each column of an information package defines the
dimension entity
 place on the periphery of the star in a diamond shaped box
 consider relationship to the measure entity “measures based
on dimension”
Time
Location
Product
Sales
Analysis
E-class
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
Age
Gender
28
Data Warehousing
Design issues - star schema
 Forming Star Schema: Define category entity
Lecture 9
 examine each individual cell of an information package to
determine if it qualifies as a category detail entity
 category entities become extensions of dimension entities
 add to star schema in stop sign box
Dimension
All Time
Periods
Year
5
Quarter
60
Month
60
Category
All
Locations
Country
20
Area
80
Region
400
District
2,000
Store
200,000
All Products
Classification
8
Group
40
Product
200
All Age
Groups
Age Group
8
All Econ.
Classes
Class
10
All
Genders
Gender
3
Customer
Product
Store
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
BIS4227 - Online Database Systems
29
Data Warehousing
Design issues - star schema
 Forming Star Schema:
Tim
e
Lecture 9
Locatio
n
Product
Details
Produc
t
Sales
Analysis
Gende
r
Econ.
Class
Store
details
Ag
e
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
Customer
details
30
Data Warehouse Example: Operational Data
Lecture 9
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
31
Data Warehouse Example: Star Schema
Lecture 9
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
32
Data Warehouse Example: Summary Report
Lecture 9
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
33
Data Warehousing
Relation with DM
Lecture 9
 They have the same purpose - decision support
 DW assembles, formats, and organises historical data to
answer user query as it is - depends on content of DW
 DW will not attempt to extract further information neither
will it predict trends and patterns from data
 DM will extract previously unknown and useful information
as well as predict trends and patterns
 DM can be performed on DW and/or traditional DB
 DM: next lecture
Dr. Nawaz Khan, School of Computing Science
E-mail: [email protected]
34