Lecture Title - Dipartimento di Informatica

Download Report

Transcript Lecture Title - Dipartimento di Informatica

Salerno - Data Warehousing
Introduction to Data
Warehousing
Janet Delve
Data Warehouse Introduction
Slide 1
Salerno - Data Warehousing
Overview
• Review of Relational Databases and
Normalisation
• Introduction to Data Warehousing
(Byte article)
Data Warehouse Introduction
Slide 2
Salerno - Data Warehousing
Books
• Inmon, W. H., Building the Data Warehouse, Wiley, 2002
• Kimball, R., Ross, M., The Data Warehouse Toolkit, Wiley,
2002
• Barquin, R.C., Edelstein, H.A., Building, Using and
Managing the Data Warehouse, Prentice Hall, 1997
• Connolly, T. and Begg, C., Database Systems - A Practical
Approach to Design, Implementation and Management,
Addison Wesley
Data Warehouse Introduction
Slide 3
Salerno - Data Warehousing
The Byte article
The article is about DATA MINING (DM) or
Knowledge Discovery (KD) and consists of three
articles:
1. The Data Gold Rush which looks at the uses of
DM – ‘finding the nugget of gold in the mountain
of data slag.’
DM has an enormous range of applications –
customer purchasing, analysing legal decisions,
astronomy, discovering patterns in health care.
2. A Data Miner’s Tools: 3 types of software:
query and reporting tools;
multidimensional analysis tools;
intelligent agents.
Data Warehouse Introduction
Slide 4
Salerno - Data Warehousing
The Byte article
3. Data Mining Dynamite looks at the processes
needed to support DM.
Data needs cleansing of unnecessary fields, and
storing in convenient form.
Uses DWs and parallel computers.
There are short term gains for businesses whose
‘advertising will target customers with new
precision.’
Long term gains – new discoveries?
Data Warehouse Introduction
Slide 5
Salerno - Data Warehousing
The Data Gold Rush
• Databases now store vast amounts of data –
credit card purchases, point-of-sale (POS)
transactions, detailed pictures of galaxies.
• Need to turn data into information to guide
marketing strategy etc. Wal-Mart uploads 20
million POS transactions every day.
• DM describes past trends and predicts future
trends.
• DM process begins with the business problem.
• DM analyst supports analyst and needs to
identify their data sources and experience.
• DM process diagram – p.84.
• Spotlight –POS example – p.84. Many products,
Data Warehouse Introduction
Slide 6
Salerno - Data Warehousing
The Data Gold Rush
• Wide geographical area, 125 weeks.
• AT&T labs, knowledge representation tools
describe database contents, thus producing
meta-data.
• DM tools search for patterns - top down or
bottom up methods are used for this;
• People use DM to increase profitability.
• Major corporations involved in DM Research
and Development – IBM, Microsoft, General
Motors etc.
• Products used for DM range from
*OLAP (On-Line Analytical Processing) such
as Essbase and
*DSS (Decision Support Systems) Agents to
*DM tools including some AI techniques
Data Warehouse Introduction
Slide 7
Salerno - Data Warehousing
The Data Gold Rush
*advanced DM tools.
• OLAP tools – DM or just ‘fancy query tools’?
• Specific mining tools for e.g. finance, health.
Sales and Marketing Solutions packs have 70%
of work done for client who tailors remaining
30%.
• P. 86 – health applications, p. 88 SKICAT,
credit-card fraud, lending decisions, stocks.
This article enquired whether one day we would be
able to mine the internet – the answer is yes,
with Google mega searches, and webhouses full
of clickstream data. Amazon.com personalised
info.
Data Warehouse Introduction
Slide 8
Salerno - Data Warehousing
A Data Miner’s Tools
• DM reveals new relationships and patterns;
• Human’s good at detecting anomalies, DM tools
good at detecting patterns;
• Intelligent Agents.
These are set up by experts, then need little
maintenance. They can work on text. They are
good for discovering unsuspected relationships.
• Multidimensional Analysis (MDA) tools.
Data Warehouse Introduction
Slide 9
Salerno - Data Warehousing
A Data Miner’s Tools
• These represent data as n-dimensional matrices
called hypercubes – OLAP. Good for iterative,
interactive, hands-on exploration of data;
• Query-and-reporting tools. These need close
direction to frame queries. Work on a database
structure. Best at asking specific questions to
verify hypotheses. Slow system down – need
DW.
• Delta Airlines – frequent-flier program.
Data Warehouse Introduction
Slide 10
Salerno - Data Warehousing
Data Mining Dynamite
• Data must be ‘cleansed’ – one bank had
nominal information stored in 13 different
formats in its various databases. Need to
eliminate errors due to duplicate data, different
formats etc. Fig. Top p. 98. Telephone
company.
• Once cleaned, data is transported to DW, but
thought needs to be given to how data is
represented.
• DW is a server-based replication of a
mainframe’s data. It regularly receives updated
info from the mainframe.
Data Warehouse Introduction
Slide 11
Salerno - Data Warehousing
Data Mining Dynamite
• The database on the DW then handles queries
from the client machine independently of the
mainframe.
• DW contains integrated and summary
information.
• DWs built by specialists – often expensive but
worth it. 1,000 ROI.
• One-Query Theorem – there may be one query
that will revolutionize your business. Hopefully,
DM can find this for you, and be self-financing.
Data Warehouse Introduction
Slide 12
Salerno - Data Warehousing
Data Mining Dynamite
• Meta-data describes contents of DW.
• DW may be set up for a particular area, but
when it is up and running, other users often
appear wanting to mine different areas. E.g. UK
store chain used DW to analyse customer
purchasing, but the chain’s accounting dept.
used the DW and discovered loss due to theft of
pens and batteries was substantial – p.99.
Data Warehouse Introduction
Slide 13
Salerno - Data Warehousing
Data Mining Dynamite
• Storage – now extremely important – some
companies lock away their storage devices.
• Storage technologies such as RAID (redundant
arrays of independent disks) are becoming
increasingly parallel.
• Need quicker way to read data off storage disk,
otherwise storage becomes bottleneck for DM
applications.
• Do all companies have a DW?
Data Warehouse Introduction
Slide 14