Title Goes Here - Binus Repository

Download Report

Transcript Title Goes Here - Binus Repository

Matakuliah
Tahun
: M0264/Manajemen Basis Data
: 2008
Manajemen Basis Data
Pertemuan 8
Objectives
• Introduction to Data Warehousing (Pengenalan Data
Warehouse)
• Introduction to OLAP (Pengenalan OLAP)
• Introduction to Data Mining (Pengenalan Data Mining)
Bina Nusantara
Introduction to Data Warehousing
• To begin a data warehouse project, need to find
answers for questions such as:
– Which user requirements are most important and which data
should be considered first?
– Should project be scaled down into something more
manageable?
– Should infrastructure for a scaled down project be capable of
ultimately delivering a full-scale enterprise-wide data
warehouse?
Bina Nusantara
Introduction to Data Warehousing
• For many enterprises, the way to avoid the complexities
associated with designing a data warehouse is to start
by building one or more data marts.
• Data marts allow designers to build something that is far
simpler and achievable for a specific group of users.
Bina Nusantara
Introduction to Data Warehousing
• Few designers are willing to commit to an enterprisewide design that must meet all user requirements at one
time.
• Despite the interim solution of building data marts, goal
remains same: i.e., the ultimate creation of a data
warehouse that supports the requirements of the
enterprise.
Bina Nusantara
Introduction to Data Warehousing
• Requirements collection and analysis stage of a data warehouse
project involves interviewing appropriate members of staff (such as
marketing users, finance users, and sales users) to enable
identification of prioritized set of requirements that data warehouse
must meet.
• At same time, interviews are conducted with members of staff
responsible for operational systems to identify which data sources
can provide clean, valid, and consistent data that will remain
supported over next few years.
Bina Nusantara
Introduction to Data Warehousing
• Interviews provide the necessary information for the topdown view (user requirements) and the bottom-up view
(which data sources are available) of the data
warehouse.
• The database component of a data warehouse is
described using a technique called dimensionality
modeling.
Bina Nusantara
Introduction to Data Warehousing
Database Design Methodology for Data Warehouses
• Nine-Step Methodology’ includes following steps:
–
–
–
–
–
–
–
–
–
Bina Nusantara
Choosing the process
Choosing the grain
Identifying and conforming the dimensions
Choosing the facts
Storing pre-calculations in the fact table
Rounding out the dimension tables
Choosing the duration of the database
Tracking slowly changing dimensions
Deciding the query priorities and the query modes.
Introduction to OLAP
• The dynamic synthesis, analysis, and consolidation of
large volumes of multi-dimensional data, Codd (1993).
• Describes a technology that uses a multi-dimensional
view of aggregate data to provide quick access to
strategic information for purposes of advanced analysis.
Bina Nusantara
Introduction to OLAP
• Enables users to gain a deeper understanding and
knowledge about various aspects of their corporate data
through fast, consistent, interactive access to a wide
variety of possible views of the data.
• Allows users to view corporate data in such a way that it
is a better model of the true dimensionality of the
enterprise.
Bina Nusantara
Introduction to OLAP
• Can easily answer ‘who?’ and ‘what?’ questions,
however, ability to answer ‘what if?’ and ‘why?’ type
questions distinguishes OLAP from general-purpose
query tools.
• Types of analysis ranges from basic navigation and
browsing (slicing and dicing) to calculations, to more
complex analyses such as time series and complex
modeling.
Bina Nusantara
Introduction to Data Mining
• The process of extracting valid, previously unknown,
comprehensible, and actionable information from large
databases and using it to make crucial business
decisions (Simoudis, 1996).
• Involves analysis of data and use of software techniques
for finding hidden and unexpected patterns and
relationships in sets of data.
Bina Nusantara
Introduction to Data Mining
• Reveals information that is hidden and unexpected, as
little value in finding patterns and relationships that are
already intuitive.
• Patterns and relationships are identified by examining
the underlying rules and features in the data.
• Tends to work from the data up and most accurate
results normally require large volumes of data to deliver
reliable conclusions.
Bina Nusantara
Introduction to Data Mining
• Starts by developing an optimal representation of
structure of sample data, during which time knowledge is
acquired and extended to larger sets of data.
• Data mining can provide huge paybacks for companies
who have made a significant investment in data
warehousing.
• Relatively new technology, however already used in a
number of industries.
Bina Nusantara
Introduction to Data Mining
Data Mining Operations
• Four main operations include:
–
–
–
–
Predictive modeling.
Database segmentation.
Link analysis.
Deviation detection.
• There are recognized associations between the
applications and the corresponding operations.
– e.g. Direct marketing strategies use database segmentation.
Bina Nusantara