Transcript Data Mining

Data Mining
CS 157B Section 2
Keng Teng Lao
Overview
• Definition of Data Mining
• Application of Data Mining
Data Mining
• Refers to the mining or
discovery of new information in
terms of patterns or rules from
vast amounts of data.
• To be useful, data mining must
be carried out efficiently on
large files and databese.
KDD
Pattern Evaluation
• Knowledge Discovery in
Databases
Data Mining
Task-relevant Data
Data Warehouse
Data Cleaning
Data Integration
Databases
Selection
Data Mining Vs. Data
Warehousing
• The goal of a data warehouse is
to support decision making with
data.
• Data Mining can be used in
conjunction with a data
warehouse to help with certain
types of decisions
Goals of Data Mining and
Knowledge Discovery
• Prediction – Data mining can
show how certain attributes
within the data will behave in
the future.
• Identification – Data patterns
can be used to identify the
existence of an item, an event,
or an activity.
Cont.
• Classification – Data mining can
partition the data so that different
classes or categories can be
identified based on combinations of
parameters
• Optimization – Once eventual goal of
data mining may be to optimize the
use of limited resources such as
time, space… to maximize output
variables such as sales or profits
under a given set of constraints.
Types of Knowledge
Discovered During Data
Mining
•
•
•
•
•
Association rules
Classification hierarchies
Sequential patterns
Patterns within time series
Clustering
Classification
hierarchies
• Process of
learning a
model that
describes
different
classes of
data.
• Decision Tree
Sequential Patterns
• The discovery of sequential
patterns is based on the
concept of a sequence of
itemsets.
• TO find all subsequences from
the given sets of sequences
that have a user-defined
minimum support.
Patterns with in Time
Series
• Time series are sequences of
event
• Each event may be a given fixed
type of a transaction
• The closing price of a stock or a
fund is an event that occurs
every weekday for each stock
fund.
Application of Data Ming
• Marketing – Application include
analysis of consumer behavior
based on buying patterns
• Finance – Applications include
analysis of creditworthiness of
clients, segmentation of
account receivables…
Cont.
• Manufacturing – Applications
involve optimization of
resources like machines,
manpower, and materials
• Health Care – Applications
include discovering patterns in
radiological images, analyzing
side effects of drugs…
Real Life Application
• The LA police departments
counterterrorism unit next are
using a new data-analysis
system designed to identify and
connect related pieces of
intelligence to help officers dter
and respond to terrorist
attacks.
Reference
• Elmasri, Remez Fundamentals of
Database Systems. Pearson.
Singapore. 2004.
• LAPD turns to data analysis to fight
terrorism.
<http://www.computerworld.com/acti
on/article.do?command=viewArticle
Basic&articleId=107670>