Diapositiva 1
Download
Report
Transcript Diapositiva 1
Data Wharehousing OLAP
Data Mining
S. Costantini
Università degli Studi di L’Aquila
[email protected]
Ringraziamenti
(Acknowledgment)
• Part of this material is taken from: Database
Systems: The Complete Book, by Hector
Garcia-Molina, Jeff Ullman, and Jennifer
Widom, edited by Prentice-Hall.
• URL: http://www-db.stanford.edu/~ullman/dscb.html
S. Costantini / Data Wharehousing
2
Observation
• Traditional database systems are tuned
to many, small, simple queries.
• Some new applications use fewer, more
time-consuming, complex queries.
• New architectures have been developed
to handle complex “analytic” queries
efficiently.
S. Costantini / Data Wharehousing
3
The Data Warehouse
• The most common form of data
integration.
– Copy sources into a single DB (warehouse)
and try to keep it up-to-date.
– Usual method: periodic reconstruction of
the warehouse, perhaps overnight.
– Frequently essential for analytic queries.
S. Costantini / Data Wharehousing
4
OLTP
• Most database operations involve OnLine Transaction Processing (OTLP).
– Short, simple, frequent queries and/or
modifications, each involving a small number
of tuples.
– Examples: Answering queries from a Web
interface, sales at cash registers, selling
airline tickets.
S. Costantini / Data Wharehousing
5
OLAP
• Of increasing importance are On-Line
Application Processing (OLAP) queries.
– Few, but complex queries --- may run for
hours.
– Queries do not depend on having an
absolutely up-to-date database.
S. Costantini / Data Wharehousing
6
OLAP Examples
1. Amazon analyzes purchases by its
customers to come up with an individual
screen with products of likely interest
to the customer.
2. Analysts at Wal-Mart look for items
with increasing sales in some region.
S. Costantini / Data Wharehousing
7
Data Warehouses
• Doing OLTP and OLAP in the same database
system is often impractical
– Different performance requirements
– Analysis queries require data from many sources
• Solution: Build a “data warehouse”
–
–
–
–
Copy data from various OLTP systems
Optimize data organization, system tuning for OLAP
Transactions aren’t slowed by big analysis queries
Periodically refresh the data in the warehouse
S. Costantini / Data Wharehousing
8
Common Architecture
• Relational Databases handle OLTP.
• Local databases copied to a central
warehouse overnight.
• Analysts use the warehouse for OLAP.
S. Costantini / Data Wharehousing
9
Definition of data warehousing
A data warehouse is a subject-oriented,
integrated, time-variant and nonvolatile collection of data in support of
management’s decision making process.
S. Costantini / Data Wharehousing
10
Loading the Data Warehouse
Data is periodically
extracted
Data is cleansed and
transformed
Data Staging Area
Source Systems
(OLTP)
Users query the data
warehouse
Data Warehouse
S. Costantini / Data Wharehousing
11
Data Mining
•
•
Data mining is a popular term for
queries that summarize big data sets in
useful ways.
Examples:
1. Clustering all Web pages by topic.
2. Finding characteristics of fraudulent
credit-card use.
S. Costantini / Data Wharehousing
12
Data Warehouse
Enterprise
“Database”
Customers
Orders
Transactions
Etc…
Vendors
Data Miners:
• “Farmers” – they know
• “Explorers” - unpredictable
Etc…
Copied,
organized
summarized
Data
Warehouse
Data Mining
S. Costantini / Data Wharehousing
13
Market-Basket Data
• An important form of mining from
relational data involves market baskets =
sets of “items” that are purchased
together as a customer leaves a store.
• Summary of basket data is frequent
itemsets = sets of items that often
appear together in baskets.
S. Costantini / Data Wharehousing
14
Data Mining Flavors
• Directed – Attempts to explain or
categorize some particular target field
such as income or response.
• Undirected – Attempts to find patterns or
similarities among groups of records
without the use of a particular target field
or collection of predefined classes.
S. Costantini / Data Wharehousing
15
Data Mining Examples in
Enterprises
• Government
– Track down criminals (Police also)
– Treasury Dept – suspicious int’l funds
transfer
• Phone companies
• Supermarkets & Superstores
• Mail-Order, On-Line Order
S. Costantini / Data Wharehousing
16
Data Mining Examples in
Enterprises
•
•
•
•
Financial Institutions
Insurance Companies
Web sites
Many others…
S. Costantini / Data Wharehousing
17