Chapter 18 by Ali Parandian & Ashira Khera (3/11)

Download Report

Transcript Chapter 18 by Ali Parandian & Ashira Khera (3/11)

Chapter 18
Ali Parandian
Ashira Khera
1
OLAP
….stands for On-Line Analytical Processing
….a series of protocols used mainly for business reporting
…. Using OLAP, businesses can analyze data in all manner
of different ways, including budgeting, planning, simulation,
Data warehouse reporting, and trend analysis
….Multidimensional view of the data allowing a manager to
Pull down data from an OLAP database in broad or specific
terms
2
 is a repository of information gathered from multiple sources,
stored under a unified schema, at a single site.
 A single data model and query language can be used to
retrieve data from the data warehouse.
 Accessing information for decision support is separate from
operational system of an organization, hence providing fast
retrieval of data without any slow down.
 Once gathered, data is stored for a long time, hence providing
access to historical data.
3
Data Sources (operational systems and flat files)・
Staging Area (where data sources go before the warehouse)・
Warehouse (metadata, summary data, and raw data)・
Users (analysis, reporting, and mining)
4
Data Warehouse Schema
Store_id
Item_id
Itemname
Color
size
City
State
country
Item_id
Store_id
Customer_id
date
Number
price
date
Fact Table
Customer_id
Month
Quarter
year
Name
Street
State
zip
Descriptor
Descriptor
Star Schema
5
Extended Aggregation
Cube Example
SELECT Type, Store, SUM(Number) as
Number FROM Pets
GROUP BY type,store
WITH CUBE
6
ROLL UP Example
SELECT Time, Region, Department,
sum(Profit) AS Profit
FROM sales
GROUP BY ROLLUP(Time, Region, Dept)
7
Cube and Rollup in a nutshell
ROLLUP enables a SELECT statement to
calculate multiple levels
of subtotals across a specified group of dimensions.
It also calculates a grand total.
CUBE enables a SELECT statement to calculate
subtotals for all possible combinations of
a group of dimensions.
It also calculates a grand total
8
The term data mining refers loosely to the process of semiautomatically analyzing large databases to find useful pattern.
Data Mining attempts to discover rules and patterns from data
Difference between Data Mining and AI
 AI uses large volumes of data stored on the disk
 Data Mining deals with knowledge discovery in the
database
9
Data Mining Continued………..
Data mining consists of five major elements:
Extract, transform, and load transaction data onto the
data warehouse system.
 Store and manage the data in a multidimensional
database system.
 Provide data access to business analysts and
information technology professionals.
 Analyze the data by application software.
 Present the data in a useful format, such as a graph or
table. graph or table
10
Applications of Data Mining
 Prediction:
Example: Person applying for a credit card
Credit card company makes prediction based on known attributes
Such as age, income, credit history etc. to predict credit risks.
 Association:
Example: Customer purchasing books online will have a tendency
To buy a likely merchandise at the same time.
Associations, clusters, classes and sequential patterns
are examples of descriptive patterns.
11
Weaknesses of Data Mining
 Data Dredging: Data dredging is the scanning of the data
for any relationships, and then when one is found coming up
with an
interesting explanation.
For example, if we test 100 random patterns,
it is expected that one of them will be
"interesting" with a statistical significance at the 0.01 level.
 Pre Processing and Post Processing of data is
extremely time consuming.
 There is no cross-industry standard practice by which
classification functions deal with ties in the data.
12
Other Types Of Mining
Data Visualization:
 It is a system to examine large volumes of data and to
Detect patterns visually such as
Maps, charts, and other graphical representations
 Data visualization systems do not automatically detect patterns
But provide system support for users to detect patterns.
13
Decision Trees
 In operations research, specifically in decision analysis,
a decision tree (or tree diagram) is a decision support tool that
uses a graph or model of decisions and their possible
consequences, including chance event outcomes,
resource costs, and utility.
 A decision tree is used to identify the strategy most likely
to reach a goal
 In data mining and machine learning, a decision tree is a
predictive model.
An example of decision Trees is classification tree.
14
Example
15
Advantages of decision trees
 They are simple to understand and interpret
 Have value even with little hard data.
 Use a white box model
 Can be combined with other decision techniques
16
References
1. http://www.anderson.ucla.edu/faculty/jason.frand/teacher
/technologies/palace/datamining.htm
17