Lecture 3 - Rabieramadan.org

Download Report

Transcript Lecture 3 - Rabieramadan.org

Data Warehouse Fundamentals
Rabie A. Ramadan, PhD
2
What did you do in Your
Assignment ?

For an airlines company, how can strategic information increase
the number of frequent flyers? Discuss giving specific details.

You are a Senior Analyst in the IT department of a company
manufacturing automobile parts. The marketing heads are
complaining about the poor response by IT in providing strategic
information. Draft a proposal to them explaining the reasons for
the problems and why a data warehouse would be the only
viable solution.
2
What did you do in the Project ?

Egypt Election System
•
Governorates’ database system
•
Summarization System
•
Data Warehouse Server
•
Web page with query based system
• Multiple databases on Multiple Servers
• Meta data
3
http://www.inf.unibz.it/dis/teaching/DWDM/index.html
4
Definitions & Motivations



Why Data Mining?
Explosive Growth of Data: from terabytes to petabytes
Data Collections and Data Availability
•
Crawlers, database systems, Web, etc.
Sources
•
•
•
Business: Web, e-commerce, transactions, etc.
Science: Remote sensing, bioinformatics, etc.
Society and everyone: news, YouTube, etc.
5
Why Data Mining?

Problem: We are drowning in data, but
starving for knowledge!

Solution: Use Data Mining tools for
Automated Analysis of massive data sets
6
What is Data Mining?

Data mining (knowledge discovery from data)
• Extraction of interesting (non-trivial, implicit,
previously unknown and potentially useful) patterns
or knowledge from huge amount of data
7
What is Data Mining?

Alternative names
• Knowledge discovery (mining) in databases (KDD),
• knowledge extraction,
• data/pattern analysis,
• data archeology,
• Data dredging,
• information harvesting,
• business intelligence,
• etc.
8
Knowledge Discovery (KDD) Process
9
Knowledge Discovery (KDD) Process
10
Typical Architecture of a Data Mining
System
11
Confluence of Multiple Disciplines
12
Why Confluence of Multiple Disciplines?

Tremendous amount of data
• Scalable algorithms to handle terabytes of data (e.g., Flickr
had 5 billion images in September, 2010
[http://blog.flickr.net/en/2010/09/19/5000000000/])

High dimensionality of data
• Data can have tens of thousands of features (e,g., DNA
microarray)
13
Why Confluence of Multiple Disciplines?
14
Different Views of Data Mining

Data View
• Kinds of data to be mined

Knowledge view
• Kinds of knowledge to be
discovered

Method view
• Kinds of techniques utilized
Application view
• Kinds of applications

15
Data to Mined

In principle, data mining should be applicable
to any data repository

We will have examples about:
• Relational databases
• Data warehouses
• Transactional databases
• Advanced database systems
16
Relational Databases
17
Data Warehouses
18
Transactional Databases
19
Advanced Database Systems(1)
20
Advanced Database Systems(2)
21
Knowledge to be Discovered
22
Characterization and Discrimination
23
Characterization and Discrimination (1)
24
Class Activity
•
Differentiate between Data Mining and Data warehousing?
Data warehousing is merely extracting data from different sources, cleaning the
data and storing it in the warehouse. Where as data mining aims to
examine or explore the data using queries

What are the Different problems that “Data mining” can solve?

Data mining can be used in a variety of fields/industries like marketing,
advertising of goods, products, services, AI, government intelligence.

How does the data mining and data warehousing work
together?

Data warehousing can be used for analyzing the business needs by storing
data in a meaningful form. Using Data mining, one can forecast the
business needs. Data warehouse can act as a source of this forecasting.
25
Frequent Patterns,
Associations, Correlations
26
Classification and Prediction
27
Cluster Analysis
28
Outlier Analysis
29
Evolution Analysis
30
Techniques Utilized
31
Applications Adapted
32
Major Challenges in Data Mining
33
Summary
34