Data Warehouse

Download Report

Transcript Data Warehouse

Data Warehouse
A place the information system department puts
the data that is turned into information.
Data must be properly prepared, organized, and
presented to the user if it is to be optimally useful.
Data Warehouse
1
Definition of Data warehouse


A subject-oriented,integrated,non volatile
and time-variant collection of data in support
of management decision.
A single, complete and consistent store of
data obtained from variety of sources and
made available to end user in a way they
can understand and use the business
context.
Data Warehouse
2
Data Mart




A Data Mart is a smaller version of a data
warehouse typically contain data related to
a single functional area of the firm or having
limited scope in some other way.
Data mart is based on a set of user
requirement.
Dependent data mart Data warehouse
Independent data mart operational data
Data Warehouse
3
Who uses Data warehouse

Ideal Data warehouse user
Data Warehouse
4
Who don’t need??
Data warehouse aren’t the universal solution to
all the business information need

Anyone use job involve dealing with
individual data record.

Anyone whose job include updating the
organizational database, not just looking at
what’s already in it.

Anyone whose information need are
unstructured that they don’t fit the data
warehouse framework.
Data Warehouse
5
Justifying the data warehouse



Build the business case for data
warehouse (not a dollar or yen)
It must state things like types of data to
be included ,kind of decision to be
made with the aid of data.
Compare decision made before and
after implementing Data warehouse
and how it benefit organization
Data Warehouse
6
Data warehouse architecture
Data Warehouse
7
Architecture cont..




The transaction or other operational database from
which the data warehouse is populated
The process to extract the data from database and
transform it to internal format and structure of DWH
The process to clean and load the data into the
DWH
A process to create the summaries of data which
are expected quite often are stored in the DWH
along with imported data.
Data Warehouse
8
Architecture cont..




The central information repository (meta
data) to tell the user what’s in the DW,where
it come from,in charge of it, query tools
The DW database itself which contain the
detail and summary data
Query tools the end user interface for posing
question to the database
The user for whom the DW exists
Data Warehouse
9
Content of the data warehouse
database


Operational data : the data that come into the
DW from operational database. This are not the
part of DW but are fundamental of it.
Atomic data warehouse data :atomic data
consists of individual data items it is the level at
which data is fed into DW. it correspond to
transaction with addition of time .
 Account bal at the end of april $326.03
Data Warehouse
10
Content of the data warehouse
database


Summary data :summaries calculated
ahead of time and stored in the DW for
recall as needed.
Answers to specific queries: these are
usually created as needed and stored
only until the user who requested them
is finished

At the end of april we had 11276 customer in
zip code -----Data Warehouse
11
Getting data into warehouse








Extraction some data element in any operational
database can reasonably to expected to be useful in
decision making but other are less value.
It is necessary to extract relevant data.
Commercial tool available for extraction has window
interface which specify the following:
Which files or table to be accessed in the source
database
which file extracted
What are those field called in resulting database
What is the target m/c and database format of the
output
On what schedule should the extraction be repeated?
Data Warehouse
12
Transformation



The people who defined an organization
operational database over the year had
many priorities
Those who develop the warehouse from
these database today face
inconsistencies (poor quality of data)
among their data source.
Transformation is dealing with these
inconsistencies
Data Warehouse
13
Transformation




Naming inconsistency
The same data name may refer to different data element in
different database
One set of name must be picked for the DW and used
consistently in it.
The conversion process may encompass the following






Character converted to ASCII
MIXED TEXT CONVERTED TO UPPER CASE
Numeric data converted to consistent data type
Code converted to common form
Measurement converted to common format like timezone
currency measurement etc
Other type od reformatting like dd/mm/yyyy,mm/dd/yy etc
Data Warehouse
14
Cleansing



Information quality is a key consideration in
determining the value of information
The world is full of well documented
instances of data warehouse leading to poor
conclusion due to poor information quality
It is necessary to go through the data
entered into DW and make it as error free as
possible. this process is known as data
cleansing
Data Warehouse
15
Cleansing



Data cleansing must deal with many type of
error missing data incorrect data in one source
Inconsistent and conflicting data when two or
more source are involved
Data cleansing software can often suggest
area to check for poor quality even if it can’t
figure out how to fix them. it does this by
finding high correlations among two or more
data and suggest that exception may be error
Some type of quality check cannot be
automated loke sal $5000 or$50000
Data Warehouse
16
Manual procedure for data
cleansing



Take sample of data and analyze it
manually for quality
If any field or data source show significant
quality problem assess whether or not they
impact the decision.
If it is felt that quality problem will impact
decision made try to determine source of
problem ,and how to correct them .it may be
possible to correct them using a different
and more reliable data source
Data Warehouse
17
To conclude




Be prepare for a lot of tedious work .the
most important tool for solving these
problem is sharp eye
You may spend more time checking for error
than cleaning up error
Inconsistency error most difficult to handle
The complexity of DW increases
geometrically with the no. of source of data
fed into it.
Data Warehouse
18
Loading





Loading implies physical movement of data from
the computer storing source database to that which
will store the data warehouse database
The common channel is communication link
It may be necessary or desirable to close off access
to DW when loading is taking place
Full DW load are rare
It may be possible to keep small data mart based
on old operational data while loading is taking place
switching to new database when ready
Data Warehouse
19
Summarization



Once DW has been loaded it is
possible to create summaries .
Summaries are recreated after every
incremental update as any change in
underlying data may impact them
Query that use prestored summaries
can be answered quickly
Data Warehouse
20
In favor of few or no summary




Calculating summary require time and
resources
Summaries occupy space on mass storage
device
Someone has to fiqure out which summaries
to prestore
Working on them


Summaries can be calculated at times when
computer would be idle or nearly so
Summaries take small fraction the size of DW.
since cost of storing them is small fraction
Data Warehouse
21
Summarization




2D for 12 cell we have 8 summaries 67% additional
space
3D 36 cell we have 45 summaries 125%additional
space
With multidimension data summaries quickly
escalate the total size of raw data
Spare data row make the impact of summaries even
larger b’cos summaries tend to be less sparse than
atomic data
Data Warehouse
22
Planning and defining
summaries



These are one time cost
Cost weighted against user’s time
saved every time a query that uses a
pre stored summary is invoked
If total user time saved outweigh the
analyst time used to defining it use
summary if not probably its not worth it
Data Warehouse
23
Meta data


Human metadata: people always have
some sort of metadata in their heads. meta
data in this case is the analyst knowledge.
However there can be several problem



Natural assumption isn’t always obvious
Two data item having different names in
different files refer to the same business entity
There is no way tp specify validation rules to be
applied to data when entered ,access or security
control.
Data Warehouse
24
Computer based metadata for
people to use

To deal with these issue DW developer
often store this descriptive data in its own
database, which provide a comprehensive
guide to data resource such metadata can
contain



A lexicon of common word used in formal data
names and prominent data name
Data description, data structure and integrity
rules
A glossary of business word, terms and
abbreviations that support use of data resources
Data Warehouse
25
Computer based metadata for
computer to use


If meta data is stored in a well
structured, computer readable form
they can be read by a DBMS and can
smooth the interface between DW and
its user
It is often called data repository the
software which manages it is called
repository software
Data Warehouse
26