What is Data Warehouse

Download Report

Transcript What is Data Warehouse

CPIT 440
Data Mining and Warehouse
Lab3
www.company.com
CPIT 440
Data Mining and Warehouse
Lab3: Outlines
• Introduction to Data Warehouse
– What is Data Warehouse ?
– Difference between Data Warehouse and Database
• Introduction to OLAP operations
– Introduction to cubes
– Cube structure
– OLAP Operations
• Exercises
www.company.com
CPIT 440
Data Mining and Warehouse
Data Warehouse
• What is Data Warehouse ?
– A data warehouse is a repository of an organization's
stored data that is designed for query and analysis
rather than for transaction processing to facilitate
reporting and analysis.
– It usually contains historical data derived from
transaction data, but it can include data from other
sources.
– It separates analysis workload from transaction
workload and enables an organization to consolidate
data from several sources.
www.company.com
CPIT 440
Data Mining and Warehouse
Data Warehouse
www.company.com
CPIT 440
Data Mining and Warehouse
Difference between Data
Warehouse and Database
• A question we often asks out in the field is:
I already have a database, so why do I need a
data warehouse ? What is the difference
between a database vs. a data warehouse?
Database
Data Warehouse
Designed to handle
transactions
It is structured to make
analytics fast and easy.
It isn’t designed to handle and
do analytics well.
It exists as a layer on top of
another database or
databases, and takes the data
from all these databases and
creates a layer optimized for
and dedicated to analytics.
www.company.com
CPIT 440
Data Mining and Warehouse
Introduction to OLAP Operations
• Introduction to cubes:
– A cube is a set of data that is usually constructed from
a subset of a data warehouse and is organized and
summarized into a multidimensional structure defined
by a set of dimensions and measures.
– Cubes are the main objects in online analytic
processing (OLAP),
– It is a technology that provides fast access to data in a
data warehouse.
www.company.com
CPIT 440
Data Mining and Warehouse
Introduction to OLAP Operations
www.company.com
CPIT 440
Data Mining and Warehouse
Introduction to OLAP Operations
• Cube Structure:
– Every cube has a schema, which is the set of joined
tables in the data warehouse from which the cube
draws its source data.
– The central table in the schema is the fact table, the
source of the cube's measures.
– The other tables are dimension tables, the sources of
the cube's dimensions.
www.company.com
CPIT 440
Data Mining and Warehouse
Introduction to OLAP Operations
• Cube Structure
– A cube's structure is defined by its measures and
dimensions.
– They are derived from tables in the cube's data source.
– The set of tables from which a cube's measures and
dimensions are derived is called the cube's schema.
– Every cube schema consists of a single fact table and
one or more dimension tables.
– The cube's measures are derived from columns in the
fact table.
– The cube's dimensions are derived from columns in the
dimension tables.
www.company.com
CPIT 440
Data Mining and Warehouse
Introduction to OLAP Operations
• Cube Structure
– Star schema: A fact table in the middle connected to a
set of dimension tables
– Snowflake schema: A refinement of star schema
where some dimensional hierarchy is normalized into a
set of smaller dimension tables, forming a shape
similar to snowflake
– Fact constellations: Multiple fact tables share
dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact constellation
www.company.com
CPIT 440
Data Mining and Warehouse
Introduction to OLAP Operations
• OLAP Operations:
– Roll up: summarize data / dimension reduction
– Roll down: reverse of roll-up
• Make detailed data, or introducing new dimensions
– Slice and dice
– Pivot (rotate)
•
www.company.com
CPIT 440
Data Mining and Warehouse
Roll up and Roll down
www.company.com
CPIT 440
Data Mining and Warehouse
Slice and Dice
www.company.com
CPIT 440
Data Mining and Warehouse
Pivot (Rotate)
www.company.com
CPIT 440
Data Mining and Warehouse
Exercise 1
• Suppose that a data warehouse consists of the
three dimensions: time, doctor, and patient, and
the two measures count and charge, where
charge is the fee that a doctor charges a patient
for a visit.
www.company.com
CPIT 440
Data Mining and Warehouse
Exercise 1
(a) Enumerate three classes of schemas that are
popularly used for modeling data warehouses.
Three classes of schemas popularly used for modeling data
warehouses are
• The star schema,
• The snowflake schema
• The fact constellations schema.
www.company.com
CPIT 440
Data Mining and Warehouse
Exercise 1
(b) Draw a schema diagram for the above data
warehouse using one of the schema classes
listed in part (a).
www.company.com
CPIT 440
Data Mining and Warehouse
Exercise 1
(c) Starting with the base cuboid [day; doctor;
patient], what specific OLAP operations should
be performed in order to list the total fee
collected by each doctor in 2004?
The operations to be performed are:
• Roll-up on time from day to year.
• Slice for time=2004.
• Roll-up on patient from individual patient to all.
www.company.com
CPIT 440
Data Mining and Warehouse
Exercise 2
• Suppose that a data warehouse for Big
University consists of the following four
dimensions: student, course, semester, and
instructor, and two measures count and avg.
grade.
• When at the lowest conceptual level (e.g.,for a
given student, course, semester, and instructor
combination), the avg. grade measure stores the
actual course grade of the student.
• At higher conceptual levels, avg. grade stores
the average grade for the given combination.
www.company.com
CPIT 440
Data Mining and Warehouse
Exercise 2
(a) Draw a snowflake schema diagram for the data
warehouse.
www.company.com
CPIT 440
Data Mining and Warehouse
Exercise 2
(b) Starting with the base cuboid [student; course;
semester; instructor], what specific OLAP
operations should perform in order to list the
average grade of CS courses for each Big
University student.
The specific OLAP operations to be performed are:
• Roll-up on course from course id to department.
• Roll-up on student from student id to university.
• Dice on course, student with department=\CS" and
university = \Big University".
• Drill-down on student from university to student
name.
www.company.com
CPIT 440
Data Mining and Warehouse
Exercise 3
• Suppose that a data warehouse consists of the
four dimensions; date, spectator, location, and
game, and the two measures, count and charge,
where charge is the fee that a spectator pays
when watching a game on a given date.
• Spectators may be students, adults, or seniors,
with each category having its own charge rate.
www.company.com
CPIT 440
Data Mining and Warehouse
Exercise 3
(a) Draw a star schema diagram for the data
warehouse.
www.company.com
CPIT 440
Data Mining and Warehouse
Exercise 3
(b) Starting with the base cuboid [date; spectator;
location; game], what specific OLAP operations
should perform in order to list the total charge
paid by student spectators at GM Place in 2004?
The specific OLAP operations to be performed are:
• Roll-up on date from date id to year.
• Roll-up on spectator from spectator id to status.
• Roll-up on location from location id to location name.
• Roll-up on game from game id to all.
• Dice with status=\students", location name=\GM
Place", and year=2004.
www.company.com