Some Thoughts on Data Warehouses

Download Report

Transcript Some Thoughts on Data Warehouses

Some Thoughts on Data Warehousing
A Data Warehouse can be thought of a a special type of
database - generally a very large database (source of data)
It holds more data, which is processed by Analytical and
Statistical software
The data, or much of it, will probably be historical (also
known as Legacy data)
And some of the data probably won’t be of the same quality
and content - or even consistent
Data Warehousing Lec08 / 1
Some Thoughts on Data Warehousing
A more technical definition would be
‘A data warehouse is a subject-orientated, integrated, timevariant and nonvolatile set of data which supports decision
making processes’.
Subject databases are designed around the essential entities
of a business (for example customers) rather than
applications (for example Sales, Insurance..)
The reference to time-variant means that data sets are
organised by time periods (months, quarters..)
Data Warehousing Lec08 / 2
Some Thoughts on Data Warehousing
A data warehouse is a snapshot of an organisation at a given
or particular time.
This snapshot is created by the extraction of data from
existing systems.
This extracted data must be transformed, cleaned and loaded
into the Data Warehouse AND regular snapshots must be
made to maintain the relevance and usefulness of the data
warehoused.
Data Warehousing Lec08 / 3
Some Thoughts on Data Warehousing
To process, develop and present results from what is
normally a large amount of data (many gigabytes or terabtyes
is realistic), there is the need to use ‘data management aids’
These aids consist of
Extraction tools
Analysis tools (such as OLAP, which we’ll meet in more
details later)
Application tools
Some of the workhorses are DSS, EIS, OLAP, SQL, Data Mining
Data Warehousing Lec08 / 4
Some Thoughts on Data Warehousing
There are some interesting terms in Data Warehousing
Dimensions : A spreadsheet is a 2 dimensional display of
data - so are most relational databases.
However Data Warehousing tools can extend this association
to a ‘multi dimension’ or a ‘variable dimension’
Do you remember the nominal dimension
the ordinal dimension
and the continuous dimension in statistics ?
Data Warehousing Lec08 / 5
Some Thoughts on Data Warehousing
Just in case you don’t -A nominal variable is an unordered category such as a
region, suburb, - - An ordinal variable is an ordered category such as an age
group
A continuous variable (normally) has a numeric value such as
income, passenger-kilometres
Data Warehousing Lec08 / 6
Some Thoughts on Data Warehousing
There is a term ‘hypercube’ which is a smart way of
displaying (visualisation - which will appear soon) muiti
dimensional data - or more correctly data in a multidimension form
A hypercube is a combination of several types of dimensions
An example: Identifier dimensions could be product and store
(which are both nominal) and the variable dimensions could
be sales and customers.
You should know the terms ‘dependent’ and ‘independent’
variables
Data Warehousing Lec08 / 7
Some Thoughts on Data Warehousing
Very powerful forms of analysis when both dimensions are
continuous
And just to finish off this brief introduction to terms :
Data mining is the search for relationships and global
patterns which exist in large databases, but are obscured or
hidden in what are normally large (or enormous) amounts of
data.
Data mining is also associated with advanced machine
learning technologies which are used to discover pockets of
knowledge hidden in the mass of data
Data Warehousing Lec08 / 8
Some Thoughts on Data Warehousing
And just to finish off this introduction, and give you some feel
of what ‘large amounts of data’ are:One company in the United States of America - a catalog
publisher, reputedly has collected data on 30 million
households
The company uses Quadstone (data mining), SAS, SPSS,
KnowledgeSeeker.
It runs about 100 marketing campaigns each year
Data Warehousing Lec08 / 9
Some Thoughts on Data Warehousing
There are a number of aspects to consider when developing
a database, and many of these are applicable to a Data
Warehouse.
The major hurdle is the establishment and agreement by
Users of the need to build a Data Warehouse
This can be expanded into these ‘supporter’ groups
1. Understand the users by
– Management information needs
– Area of Business operation
– Responsibilities
– Computer literacy
Data Warehousing Lec08 / 10
Some Thoughts on Data Warehousing
This (almost automatically) suggests a Business
Management hierarchy - where no one person is responsible
for everything
And on this presumption we move onto
2. Determine what decisions people need to make OR would
like to make if only they had sufficient supporting information
3. What decisions they have made and are trying to assess
the impact of those decisions (can you remember a number
of ‘high level’ decisions made in Industry which have been
disastrous - but this knowledge arrived too late to reverse or
recover lost markets ?)
Data Warehousing Lec08 / 11
Some Thoughts on Data Warehousing
3. Who will most likely make informed decisions using the
data warehouse as the source.
It is not accurate to assume that because there is the
availability of a data warehouse, all managers or decision
makers will use it effectively (or at all)
4. Who could be ‘likely’ or potential new users ?
This simple question opens up the Competitive Appraisal or
Enterprise Management Appraisal environment.
How effective and accurate are these Appraisals - how
current are they
Who conducted them - how skilled were they ?
- what level of management skills ?
Data Warehousing Lec08 / 12
Some Thoughts on Data Warehousing
Another way of stating this is ‘Who has serious potential in
the Organisation’
Does this conflict with Privacy and Discrimination principles ?
5. Select the most efficient, effective, processible, and
reliable data from when (?) to the most current period to be
the content of the Data Warehouse
6. Make life enjoyable for the users
– user screens
– software icon or template based
– match these with the skills of the users
Data Warehousing Lec08 / 13
Some Thoughts on Data Warehousing
7. Match the processes with the current and adaptive skills of
the users - people do develop advanced skills
8. Ensure that the data used is accurate, and can be trusted
and consistently recognised across all departments, areas,
states, countries of the Company
9. Monitor the results and particularly the reactions of the
users to the information advantages they are receiving
10. Search, or listen attentively for other sources of data for
the particular Data Warehouse
Data Warehousing Lec08 / 14
Some Thoughts on Data Warehousing
11. Be alert to changes in Management - and the effect this
could or will have on a Data Warehouse (either an
established one, or one being developed. - Senior
Management changes do happen quickly at times especially after a General Meeting where shareholders can
either be Happy or Not Happy)
12. Keep the users, and particularly the ‘Upper End of
Management’ users satisfied
And, make sure there is an energetic, resourceful and
knowledgeable Project Leader (and read the paper on Team
Construction at the website).
Data Warehousing Lec08 / 15
Some Thoughts on Data Warehousing
There are some impediments to the design, development and
use of a Data Warehouse
1. Not surprisingly, not every organisation is geared to, or is
working towards total centralisation
2. The development of Data Warehouse invariably requires
the integration of a number of technologies which are not
compatible at many levels. This is due, in part, to the
ongoing developments and upgrades which are inherent in,
and to, any user of Information Technology
Data Warehousing Lec08 / 16
Some Thoughts on Data Warehousing
3. The ‘we want it now’ syndrome - which if allowed to rush
the design, development and implementation invariably leads
to a poor design, low quality but rapid development, and a
Warehouse which doesn’t match expectations
4. The effect of changes both during the development and at
the operational level. Conditions do change, and
requirements also change and the overall effect is ongoing
planned and directed maintenance
5. Data Warehouses process large amounts of data and use,
in most cases, high level analysis and reporting tools. This is
not a good environment for rapid or immediate response
Data Warehousing Lec08 / 17
Some Thoughts on Data Warehousing
6. Another dimension which has occurred in the past 2 to 3
years is the emphasis on ‘customer information’ which
depends on normal IT systems to capture this data - this is a
developing skill and early data will not be as complete nor as
revealing as ‘newer’ systems
7. The tracking, storing and prediction of customer behaviour
may require different analysis tools.
8. Current databases provide ‘On Line Analytical Processing’
capabilities for on line real time applications - the major effort
is to associate this with legacy data in a meaningful and
seamless manner - and maintain high processing rates
Data Warehousing Lec08 / 18
Some Thoughts on Data Warehousing
So now that the groundwork has been laid, let’s move onto
some more ‘ponderables’ of what are regarded as ‘Design
Constraints’ - another name for non-avoidable requirements .
1.
Visualisation
– easily and quickly digested (understood)
– simple but informative
– recognisable information (names …..)
– intuitive
Data Warehousing Lec08 / 19
Some Thoughts on Data Warehousing
So what do these contribute to a Data Warehouse ?
That is not the right question - It should be, what should
design provide to make the results of a Data Warehousing
process
– understandable
– easy to reprocess
– easy to explore
– easy to partition or to hand over to other tools
Data Warehousing Lec08 / 20
Some Thoughts on Data Warehousing
If you look at many of the ‘advanced systems’ available you
would be probably convinced that one of the main design
objectives is to make the ‘system’ as complex as possible
That’s not good
Many designs are complex and intricate - great skill is
needed to source, process, analyse, locate, redimension
Features are not solutions - features are an aid to arrive at a
solution
Data Warehousing Lec08 / 21
Some Thoughts on Data Warehousing
Comprehension
Smart or overfull screens have two effects
– they divert the attention of serious users
– they confuse people
– they make a process difficult to operate
Screen displays should have more free space
Data Warehousing Lec08 / 22
Some Thoughts on Data Warehousing
Screen displays should provide less choice - not all possible
choices (or menu items) on one screen. (Many web sites
suffer from the overfull problem).
‘Smart’ dashboards do not appeal to many users
Multiple mouse clicks on the same screen mean a complex
screen.
Data Warehousing Lec08 / 23
Some Thoughts on Data Warehousing
Delivery Speed:
Very much associated with ‘We must have it now!’ syndrome
Acceptable delay tolerance from the end user is <= 0
Designers cannot to expect users to accept a long delay
even if
– the results are complicated
– the amount of data to be processed is large
– the process algorithms are complex and/or recursive
Data Warehousing Lec08 / 24
Some Thoughts on Data Warehousing
Cost of ‘Implementation’ - from an idea to production
These costs include:
Labour (staff) costs during the design stage
Delay costs during the design stage
and this is before any useful results are produced - possibly a
good case for a Return on Investment or Rate of Return
calculation ?
The are other incidental costs such as Business staff and
management time
Data Warehousing Lec08 / 25
Some Thoughts on Data Warehousing
Costs are also connected with the size of the Data
Warehouse
A 10 table design may not incur much complexity.
A 100 table is almost certain to
A 1000 table is guaranteed to (and may not even be
successful).
Data Warehousing Lec08 / 26
Some Thoughts on Data Warehousing
Technology Costs - Hardware and Software
These items should be scaled to the ‘size’ of the known
requirements
And they should be able to be extended beyond the first
implementation - if it is successful (nothing generates more
demands than a successful Data Warehouse - probably due
to a ‘wait and see’ attitude)
Hardware should be periodically discarded and replaced
with newer and more powerful versions
Data Warehousing Lec08 / 27
Some Thoughts on Data Warehousing
Software is the key to fast development, user-friendly
systems, and fast delivery of data to the users for their
queries of the Data Warehouse applications.
A Data Warehouse works on a 2 phase basis
1. An ‘Extract, Transform and Load’ process (also known as
ETL)
2. Front office user queries and reports
Much of the design work is focussed on the processes which
will provide relevant and accurate data from the sources so
that queries and reports can be produced
Data Warehousing Lec08 / 28
Some Thoughts on Data Warehousing
Daily Management and Administrative Costs
This is NOT a NEW phenomenon which occurred with the
emergence of Data Warehouses
It is not avoidable but it is manageable and can be controlled
Items which are found here are
Routine loading of data into Fact and Dimension tables
Production of standard reports
Data Warehousing Lec08 / 29
Some Thoughts on Data Warehousing
Unexpected but Necessary Amendments to either Data or
Processing
Included here are such annoying aspects such as
Late arriving facts
Late arriving dimensions
Corrections to existing data
Other Amendments include
New dimensions
New Dimension Attributes
New facts
The details or summary structure of the data
source
Data Warehousing Lec08 / 30
Some Thoughts on Data Warehousing
Prevention of Inappropriate or Irrelevant Results
This can be avoided and depends of the ‘correct target’ being
selected
Which in turn means that the extensive and continuous
business requirements gathering at the beginning and
throughout the life of the warehouse
Which presupposes that either a data warehouse application
may outlive its purpose
- Or it may be replaced by a newer, different model
Data Warehousing Lec08 / 31
Some Thoughts on Data Warehousing
Many years ago a chap called Hamlet (in one of William
Shakespeare’s play (of the same name)) was given these
words
‘To be, or not to be
That is the question’ (and lots more of course).
The current paraphrase of that is
‘To centralise or not to centralise
That is the question’
The complete quotation is at the end of these overheads
Data Warehousing Lec08 / 32
Some Thoughts on Data Warehousing
There are solid arguments to ‘Centralise’ and there are solid
arguments ‘Not to Centralise’.
Centralisation of data (as in a Centralised Data Warehouse)
invariably assumes
‘perfect information, control and communication’
An alternative is to create data marts (specialised smaller
centres of data) and to tie, or associate, these with an
architecture which allows them to work together effectively
(much like a distributed database).
Data Warehousing Lec08 / 33
Some Thoughts on Data Warehousing
As with many developments, there are a number of
‘inescapables’ (much like fees and taxes)
Unavoidable realities are a recognition that the ideal model is
an ideal, not a practicality nor reality
A realistic model is referred to a ‘descriptive’ - the ideal model
is classed as ‘normative’.
So with that little bit of Business philosophy, let’s have a quick
look at reality in designing and implementing a data
Warehouse
Data Warehousing Lec08 / 34
Some Thoughts on Data Warehousing
The opportunities for building a ‘perfect design’ from a
centralised position are very limited (in calculus this would be
represented by sn  0 )
The major elements of skills, understanding, foresight,
wisdom, sufficient, time are not normally available in
abundance
We construct, and live with, designs in separate areas
(departments) and we learn the requirements of the many
users slowly and sometimes painfully. We also need time and
patience to understand and recognise the impact of data
Data Warehousing Lec08 / 35
Some Thoughts on Data Warehousing
We develop things incrementally
It’s not surprising then that the weight of experience indicates
that a Data Warehouse is basically a decentralised system
and may have no ‘core matter’ at all
Don’t get discouraged - our national road systems were not
constructed overnight
Neither were our water, power, gas systems nor our
Universities - and they are subject to continuous change - so
why should there be the unnatural expectation for a Data
Warehouse ?
Data Warehousing Lec08 / 36
Some Thoughts on Data Warehousing
Another level of complexity is that of many, and incompatible,
technologies
It is normal for organisations to have multiple ‘business’
systems which were developed at different time periods and
on different computers and with different applications
software platforms
To integrate these requires much skill at the communications
and applications levels. XML is the current ‘hope’ for clear
and multiple different data forms.
Curl is another new object oriented language which has
appeared - and there will be others …..
Data Warehousing Lec08 / 37
Some Thoughts on Data Warehousing
Most users expect what is known as ‘rapid deployment’ which
means the time from inception to delivery of output
Oracle Corporation (and others) suggest that a minimum 6
week period from acceptance and recognition of the need for
a Data Warehouse is industry acceptable
What would probably happen if the ‘wait period’ was longer ?
If the deployment period is say 12 months, what are the risks
of - budget reallocation ?
- loss of organisation competitive edge ?
- no more work for the Data Warehouse contractor ?
Data Warehousing Lec08 / 38
Some Thoughts on Data Warehousing
Continuing Change
One aspect of Business is predictable - change
This makes long range assumptions both dangerous
(particularly if they are not updated) and unreliable
Timeliness of Data
The Data Warehouse should ideally reflect both atomic data
(the most persistent, lowest level of data)
and real time - which is an expectation that latest data will be
less than 60 minutes old, or even 0 minutes old.
There is a term ‘lockstep’ which refers to the ‘in step’
condition of operational systems and Data Warehouse
systems.
Data Warehousing Lec08 / 39
Some Thoughts on Data Warehousing
The Optimists probably will look for ‘data ahead of its time’,
that is before an event happens
We currently have ‘futures’ and ‘futures trading’ so is ‘future
data’ all that unrealistic ?
Seamless Integration/Connection
The final requirement is for the Data Warehouse to be able to
connect the real-time data without prejudice (seamless) with
the static historical data (which hopefully will not alter).
This will give the appearance of a total time base
Data Warehousing Lec08 / 40
Some Thoughts on Data Warehousing
The next few overheads will try to explain some current
techniques and software which are employed to both
minimise the time factor in producing outputs, and also
improve or maintain the quality and scope of data
There will also be some thoughts about a ‘new technique’
known as ‘Executive Dashboards’
Data Warehousing Lec08 / 41
Some Thoughts on Data Warehousing
A New Term - OLTP databases
OLAP is the acronym for ‘On Line Transaction Processing’
which links in nicely with the previous overheads regarding
the ‘we want it now’ catch cry from Business Managers and
Customers (we’ve overlooked these people have we ?)
So now we have OLTP databases and Data Warehouses
Big deal ?
What’s the difference or differences, if any ?
Is it just another expensive IT tool ?
Data Warehousing Lec08 / 42
Some Thoughts on Data Warehousing
OLTP databases run business applications developed around
– Siebel
– SAP
– PeopleSoft (and others) ? Oracle takeover
They are developed for multiple users, many of whom are
remote from the processing centre (as if that is something
unusual with our current capabilities with communications)
These users require (sometimes ‘demand’)
– very rapid response times
– very high levels of accuracy of data
(why ?)
Data Warehousing Lec08 / 43
Some Thoughts on Data Warehousing
In these conditions, it is reasonable to suggest that there
could be thousands of concurrent users (you remember
concurrency ?) many, or perhaps all, of whom want to press
the Enter key (or Submit a process with the more menu
function driven Client devices)
and expect
– an immediate response
– an accurate response
Data Warehousing Lec08 / 44
Some Thoughts on Data Warehousing
Let us digress a bit and ponder on what is going on at the
database level
There will be a series of processing events which must occur
(quickly, but in a predetermined sequence)
1. Business programming logic and SQL statements (like the
SQL you have met) must be executed - this is not avoidable
2. Response data from the appropriate database or
databases must be gathered - this will involve at least one
CPU, some (or much) sorting, and also the unavoidable
Input/Output (I/O) time.
Data Warehousing Lec08 / 45
Some Thoughts on Data Warehousing
When all the detail events have been completed and the
business function is performed, the production unit of work is
committed, or rolled back. (you have met these conditions
earlier)
Just to make things a bit more entertaining, some OLTP
database applications employ batch processing and others
will service decision-support queries (with the associated
extensive browsing and highly complex calculations required
in many cases)
An just to query the slogan ‘Life’s Good’ (or LG), OLTP
databases are expected to run thousands of transactions per
minute.
Data Warehousing Lec08 / 46
Some Thoughts on Data Warehousing
OLTP database applications also perform
– Insert
– Update
– Delete
functions - and you have had some contact with transactions
which affect more than 1 row in a table, and more than 1
table in a database (known as contention management and
locking technology)
So you will appreciate that OLTP is a high work level
environment - so to expect fast output adds to the transaction
load
Data Warehousing Lec08 / 47
Some Thoughts on Data Warehousing
Let’s look at Data Warehouse Databases
They assist in Business decision making processes - and of
course some (or many) of these are complex
Normally, Data Warehouse databases do not run transactions
at high rates
However, they do respond to complex business questions
relevant to the available data
These ‘queries’ are delivered to the database via complex
SQL or by user friendly query tools
Data Warehousing Lec08 / 48
Some Thoughts on Data Warehousing
Users don’t normally expect instantaneous response
Data Warehouse response times tend to be measured in
minutes - although some queries will take much longer
There are a number of interesting aspects of a Data
Warehouse database
– they tend to be ‘read only’
– they are not available 24 x 7 as there needs to be a
data loading time
– this data load is done in batch mode in ‘off hours’
– ‘parallelism’ is used to improve response time
Data Warehousing Lec08 / 49
Some Thoughts on Data Warehousing
You have probably heard of the need to ‘tune’ software to
improve performance (aka response time)
The same consideration applies to Data Warehouses
Performance tuning improvements can provide about 20%
improvement by configuring the database manager, the
database itself , and tablespace settings.
Further improvements can be achieved by carefully modifying
the query access SQL code
Data Warehousing Lec08 / 50
Some Thoughts on Data Warehousing
‘Tuning SQL’ invariably requires that
– physical design alterations involving optimum
indexes
– indexes are applied to minimise sort (output) times
– clustering is used to minimise sort times
– automatic Summary Tables are developed
– stored procedures are used or implemented
– triggers are used
Data Warehousing Lec08 / 51
Some Thoughts on Data Warehousing
The term ‘multidimensional data’ is commonly used in Data
Warehouse applications
There are 2 basic models
These are
– The Star model
– The Hypercube
The Star model is similar to a relational database table in
construction
The ‘attributes’ are known as ‘facts’
Data Warehousing Lec08 / 52
Some Thoughts on Data Warehousing
A Star Structure
Product
Model
Type
Colour
Channel
Product
Region
Time
Channel
Revenue
Expenses
Units
Region
Nation
District
Dealer
Time
Week
Year
Data Warehousing Lec08 / 53
Some Thoughts on Data Warehousing
Don’t be mislead
The selection of the ‘components’ and ‘sub-components’
takes considerable time and rationalisation
And agreement
Data Warehousing Lec08 / 54
Some Thoughts on Data Warehousing
A Hypercube is a representation of a multidimensional set of
data
These could be the dimensions of such a construction
Dimension
Example
Brand
Haymes
Store
Caulfield East
Customer Segment
Home renovation
Product Group
Paints
Period
September
Variable
Number Sold
Data Warehousing Lec08 / 55
Some Thoughts on Data Warehousing
The spatial representation of data in a cube can be difficult
2000
1999
Red
ALL
ALL
ALL
Holden
Toyota
Data Warehousing Lec08 / 56
Data Warehousing
Levels of Abstraction
Data
Application Process Technology
Strategic Security
Confidentiality, Integrity, Availability
e.g. Information Protection Policy, Privacy
Policy
Tactical Security
Confidentiality, Integrity, Availability
e.g. Risk Assessment, Incidence Response
Operational Security
Confidentiality, Integrity, Availability
e.g. Password Protection, Biometrics,Firewalls
Quality
Security
Data Warehousing Lec08 / 57
Some Thoughts on Data Warehousing
An expert in Data Warehouse design and implementation
(Ralph Kimball) gives this advice on design a successful Data
Warehouse
1. Separate the systems
– logically
– physically
– administratively
Data Warehousing Lec08 / 58
Some Thoughts on Data Warehousing
2. The Data Warehouse project should be considered as 4
distinct and different systems
– Production (source) transaction processing systems
– Data Warehouse staging area systems
– Data Warehouse presentation systems, including
client/server and Web-based query tools and report
writers
– Optional high-end analytic tools supporting data
mining, forecasting, scoring or allocating
Data Warehousing Lec08 / 59
Some Thoughts on Data Warehousing
The data warehouse manager should NOT be responsible for
the source systems which are capturing and processing
transactions
The data warehouse manager should not therefore be
involved with legal and financial functions
The data warehouse manager should not be involved with
rollback and recovery functions of executing systems
Data Warehousing Lec08 / 60
Some Thoughts on Data Warehousing
So now that we have an idea of what a Data Warehouse
manager should not get involved with, let’s see what
systems a Data Warehouse manager SHOULD get involved
with :
1. The Data Staging area:
Production data from many sources is collected, cleaned,
combined and eventually delivered to the Data
Warehouse presentation systems
An important feature here is that final data warehouse clients
should not ever have access to the staging area
Data Warehousing Lec08 / 61
Some Thoughts on Data Warehousing
This should ensure
– no guarantee for up-time service levels for queries or
reports
– no requirement to enforce client-server security
– no need to build query performance enhances such as
indexes or aggregations
– that logical and physical conflicts between cleaning and
querying just won’t occur
– no guarantee of consistency across separate
asynchronous data sources
Data Warehousing Lec08 / 62
Some Thoughts on Data Warehousing
An explanation :
The major structures in the data staging area are
– flat files
– the entity/relationship schema
These are sourced from the production systems (they are
either directly extracted or are derived).
About 90 to 95% of the processing in the staging area is
sorting or sequential processing
Data Warehousing Lec08 / 63
Some Thoughts on Data Warehousing
2. The Presentation System
This must be built to support and enhance the query and
reporting activities
This system needs to be
– simple (where have you heard that before ?)
– fast
– and most importantly, will produce the ‘right data’ to
satisfy the analysis needs of the end users and
stakeholders
Data Warehousing Lec08 / 64
Some Thoughts on Data Warehousing
As a bonus, those requirements which were excluded from
the staging area can now be addressed
The high level data structures in the in the presentation level
are
– the relational star schema
– and the on-line analytical processing data cube
Data Warehousing Lec08 / 65
Some Thoughts on Data Warehousing
3. An operational layer of specific high-end analytic tools
These tools will (normally) use large amounts of data
from a Data Warehouse, and generally in batches
Not surprisingly perhaps, the skills of analysis (data
mining, forecasting, scoring and allocating tools) use
highly specialised formulae and algorithms which are
best handled and applied by experts - generally not the
Data Warehouse designer
Data Warehousing Lec08 / 66
Some Thoughts on Data Warehousing
Data mining for instance often needs to assess large
numbers of ‘observations’ - this can be thousands or
hundreds of thousands of observations
And this can be an iterative process - re-use
And high data rates are the order of the day
A safe solution is to hand the set of observations, (just once)
to the data-mining team (who would be professional datamining experts) and to let them apply their skills
Data Warehousing Lec08 / 67
Some Thoughts on Data Warehousing
The current tools or trends in presentation are relational star
schemas and OLAP data cubes
They have proven to be excellent devices for presentation
and also satisfy one of the design criteria which is ‘the ability
to present results in an understandable manner’.
Supporting these tools (or these methods of data
presentation) are some smart software which allows queries
to be developed based on powerful algorithms.
The latest standard for SQL for instance has a ‘cube’
command in its repertoire
Data Warehousing Lec08 / 68
Some Thoughts on Data Warehousing
Star schemas are directed at very large data sets with
millions or billions of measurements or in lay terms many
millions of customers in a customer or product entity.
OLAP cubes, as you have gathered by their construction, are
appropriate for smaller data sets where analytic tools can
perform complex data comparisons and calculations
And of course, there are hybrid star schema - OLAP systems
which combine the best of both worlds and allow for very
large data sets in star schema format to be smooth drill-down
targets from smaller OLAP data cubes, all within a single
interface.
Data Warehousing Lec08 / 69
Some Thoughts on Data Warehousing
The next 2 overheads, which are quite complex and are
probably difficult to read, hopefully will give some impression
of the muscle in Oracle 9i applied to the design, development
and availability of databases and data warehouses
Data Warehousing Lec08 / 70
Oracle 9i
9i Application Server
Building
Infrastructure
Business Logic
Presentation Logic
Users
BC4J -
Oracle JSP
Component Based Development
Business Rules with BC4J
Oracle 9iAS
Forms Service
HTTP Server
Database
Model Based Development
Database Schemas
Users
Oracle 9iAS
Reports Services
Database Form-based
Applications and Reports
Administration
Data Warehousing Lec08 / 71
Another Meaningful Diagram
BI Development Tools
Other Operational Data
External Data
Designing and
Building the Warehouse
Creating Reports
Preparing to Mine data
Building BI Applications
Building Predefined Queries
Data Load
Developer Suite
Warehouse
Builder
E-Business Suite
Data
9i with OLAP
Services
Electronic
Storefront
Data
Querying
Mining
Reporting
Analysing
Legacy Data
Application
Server
BI Application Deployment
Data Warehousing Lec08 / 72
Some Thoughts on Data Warehousing
And to finish, I’m going to introduce a ‘state of the art’ device
It is known as an Executive Dashboard
It is a subset of portal technology (you have used your
Monash Portal ??) to provide
– specific
– performance based information
to Upper and Senior Management
It can be described as a subset of a Data Warehouse
presentation
Data Warehousing Lec08 / 73
Some Thoughts on Data Warehousing
It is a one-screen display (cockpit is the term used) of all
critical measurements for managing an organisation which
offers
– actionable information
– business performance measurements
– up to date information on status
– up to date information on forecasts against
benchmarks
They are the current facility of the visualisation of Critical
or Key Performance Indicators
Data Warehousing Lec08 / 74
Some Thoughts on Data Warehousing
Finally :
Just in case you are wondering what the full text of Hamlet’s
‘to be or not to be’ soliloquy is
The next 3 overheads contain his thoughts on ? ? ? ?
Data Warehousing Lec08 / 75
Shakespeare’s Hamlet, Prince of
Denmark
To be, or not to be - that is the question;
Whether ‘tis nobler in the mind to suffer
The slings and arrows of outrageous fortune
Or to take arms against a sea of troubles,
And by opposing them ? To die, to sleep No more; and by a sleep to say we end
The heart-ache and the thousand natural shocks
That flesh is heir to. ‘Tis a consummation
Devoutl’y to be wished. To die, to sleep;
To sleep, perchance to dream. Ay, there’s the rub;
For in that sleep of death what dreams may come
When we have shuffled off this mortal coil,
Data Warehousing Lec08 / 76
Shakespeare’s Hamlet, Prince of
Denmark (Act 3)
Must give us pause. There’s the respect
That makes us calamity of so long life;
For who would bear the whips and scorns of time,
Th’ oppressor’s wrong, the proud man’s contumely,
The pangs of despis’d love, the law’s delay
The insolence of office, and the spurns
That patient merit of th’ unworthy takes,
When he himself might his quietus make
With a bare bodkin ? Who would there fardels bear,
To grunt and sweat under a dreary life,
Data Warehousing Lec08 / 77
Shakespeare’s Hamlet, Prince of
Denmark (Act 3)
But that the dread of something after death The undiscover’d country, from whose bourn
No traveller returns - puzzles the will,
And makes us rather bear those ills we have
Than fly to others that we know not of ?
Thus conscience doth make cowards of us all;
And the native hue of resolution
Is sicklied o’er with the pale cast of thought,
And enterprises of great pitch and moment,
With this regard, their currents turn awry
And lose the name of action.
Data Warehousing Lec08 / 78