19. Data Warehouse Design Architecture

Download Report

Transcript 19. Data Warehouse Design Architecture

Amirkabir University
Data Warehouse Design Architectures
Morteza Zaker
Supervisor : Prof . Abbdolahzadeh
1
Presentation plan

Introduction





Data Warehouse Architecture
Concepts of dimensional model
History of Data Warehouse
Modeling issues
Conclusions
2
DW and OLAP – general concepts




Data Warehouses – contain historical data for
supporting decision-making process
On-Line Analytical Processing systems - facilitate
manipulation of DW data
DW and OLAP require clear definition of facts,
dimensions, and hierarchies
DW logical level design based on star/snowflake
schema
3
Data Warehouse Architecture
Data Flow Architectures







Single DDS
NDS+DDS
ODS+DDS
System Architecture
Federated Architectures
ETL Architectures
Extract and transform
Loadvia
(ETL)
thenfront-end
bring data
from
various
source system
User Can and
get data
several
tools
and
applications.
system
intoETL
a stage
area.is managed by Control system based on the rules in the metadata.
Metadata is a database that contain information about the data structure data usageThe Data is operate
examined
to
realize characteristic
of data
by data
Some
Application
Multidimensional
format.
So data
in theProfiler.
DDS is Loaded
quality
rules
and
other on
information
about
data.
ETL
Integrate
and
transform
stage’s
data
then
load
it
to
dimensional
data
store
to Multidimensional
database(
MDB ).
Audit
system
is
used
for
understanding
what
happen
during
ETL
process
and then
(DDS).
Data
profiler analyze data to find out that for example how many row does Table has?
Source
system
are OLTP
system
thatdocontain
data which
is loaded
to DW
. data
logs
system
oprenation
into
Metadata
database.
when
loading
data
into
DDS
tolike
check
dataCells
then
bad
And
which
onethat
is various
Null
so on.
Multidimensional
is
a form
of,(DQETL)
database
dataand
is rules
stored
a cub.
of
cube
push
into DQ
dataof
base
for reporting
correcting
.Bad data
Automatically
represent
number
variable
which
isand
called
Dimensions.
Value
of dimensionbeshow
OLTP
:
Capture
and
store
Business
Transaction
online.
corrected
tolerated
if it event
can behappened.
needed.
when and or
where
business
Spreadsheet
Pivot tables
Ad Hoc query
Metadata
Data Profiler
Source
System
correction
ETL
stage
reports
Control
system+Audit
DQ
ETL
DQ
reports
DDS
MDB
analytic
Data Mining
Other BI
Application
reports
6
Single DDS
Application
1.
2.
Control
system+Audit
s1
Source
System
s2
3.
Metadata
ETL+DQ
DDS
2.2.
3.
3.
Advantage of Single DDS is simple to design , because
the data from the stage is loaded straight into the
dimensional data store, without going to any kind of
normalized store. It is good for system which has
just one source or just has one dimension.
Stage
MDB
1.1.
Extract data from several source
system
Push it in stage area.
Stage area
could be a database or files system.
Stage is necessary because of lacking
memory space and so on .
ControlETL
system
+ Audit
Second
package
pickmanage
up dataETL
system
concurrently
.
from Stage and Integrates them.
Log ETL
process
to Metadata
Apply
some
Data Quality
rules file or
database
Puts
consolidated data into a DDS
Metadata contain Data Structure and
data processing within data warehouse
Application
The main ِDisadvantage is that it is more difficult, in
this architecture, to create a second DDS. The DDS
in the single DDS architecture is the master data
store.
NDS + DDS
Control
system+Audit
Metadata
1.
2.
s1
s2
Application
DDS
NDS-ETL
+DQ
NDS
DDS-ETL
DDS
s2
6.
7.
Stage
Application
MDB
1.
2.
3.
4.
5.
NDS is the in front of DDS and NDS is
our master data .Master data contain
all historical nad structral data .
DDS is our Transactional data and just
could contain Single years of data .
Application
Data storage = Stage, NDS & DDS
Core DW Store = Normalized & Dimensional
Format
Data Marts = 1 to N Data Marts in each DDS
ETL Engine = 4 ETL Package
NDS Contain Master table and transaction
Table
Master Table  Dimensions in DDS
Transaction table  Facts in DDS
ODS + DDS
1.
2.
Control
system+Audit
Metadata
s1
s2
ODS-ETL
+DQ
Application
3.
4.
5.
DDS
6.
7.
DDS-ETL
ODS
DDS
s2
Stage
Application
MDB
We have got
Data storage = Stage, ODS & DDS
Core DW Store = Normalized & Dimensional
Format
Data Marts = 1 to N Data Marts in each DDS
ETL Engine = 4 ETL Package
ODS Contain Master table and transaction Table
but it is not Master data store
Master Table  Dimensions in DDS
Transaction table  Facts in DDS
The advantage of this architecture is that The third
normal form is slimmer than the NDS because it
contains only current values.
In this architecture we have a central place to
integrate, maintain, and publish master data.
The normalized relational store is updatable by
the user application.
The main ِDisadvantage is that it is more difficult, in
this architecture, to create a second DDS. The DDS in
the single DDS architecture is the master data store.
Application
ODS is hybrid data store so User can access data from ODS
Federated DW
FDW
Application
FDW
ETL
DW2
Application
FDW
EII
DW3
DW1
DW2
Application
ETL
DW3
DM1
DW1
DM2
DM3
EII(Extract Information Integration)
1.
2.
1.
2.
3.
4.
The FDW ETL needs to match the Updating
time frequency of the source DWs.
The FDW ETL needs to integrate the data
from source DWs based on business rules.
Duplicate records need to be merged.
Subject area in here is very narrow that the
source DWs.
is a method to integrate data by accessing
different source systems online and aggregating
the outputs on the fly before bringing the end
result to the user.
All 3 DWs must be standardized as the same
structure.
1.
Data marts in the same Data warehouse is
nonintegrated data marts.
2.
They can be dimensional, normalized, or
neither
System Architecture
Web Server
Win2000
SQL 2000
Report Server
Web Farm
OLE DB
NDS + DDS
IBM
DB2
Giga Ntework
ODBC
OLE DB
ETL Server
its 2 set
for Failover
and Clustering
Clients
Oracle 11G
4 Processor
16G RAM
Informix
Fiber Network
OLAP Server
SQL Server 2008
SSAS
Storage Area Network(SAN)
20 T rea Byte
HP
ETL Architectures
Source System
ET L Server
Extract
DW Database server
Source System
ET L Server
Load
Extract
T ransform
Source System
Extract
Stage on Disk
ET L Server
T ransform
DW Database server
Load
Load
T ransform
Source System
ET L Server
T ransform
DW Database server
Extract
DW Database server
Load
No Stage and
its in Memory
Main Issues that must be considered
There are two different types of database software
1.
Symmetric multi processing(SMP)
It is a databas system that runs on one or more machines with several identical
processors sharing the same disk storage. The database is physically located in a
single disk storage system. Examples of SMP database systems are SQL Server,
Oracle, DB/2, Informix, and Sybase
2.
Massively parallel processing (MPP)
It is a database system that 20uns on more than one machine where each machine
has its own disk storage. The database is physically located in several disk storage
systems that are interconnected to each other. An Examples of MPP database systems
are Teradata, Neoview, Netezza.
MPP database system is faster and more scalable than an SMP database system. In an
MPP database system, a table is physically located in several nodes, each with its own
storage.
Research challenges (1)

Spatial measure aggregations considering

Their types





Distributive – reuse of aggregates, e.g., spatial union
Algebraic – additional treatments for reusing aggregates, e.g.,
center of n points
Holistic - new calculation with a row data, e.g., equi-partition
Topological relationships between hierarchy levels
Types of hierarchies
14