Transcript PPT
DATA MINING IN
APPLIED WORLD
Submitted by:
Mr. AMIT V. MALWADE
FINAL YEAR IT
Guided by:
Prof .Mr. NITIN R. CHOPDE
Dept of IT,
CBSCEM AMRAVATI.
WHAT IS DATA?
Data are a bunch of values of one or more variables.
A variable is something that has different values.
Values can be numbers or names, depending on the
variable.
• Numeric, e.g. weight
• Counting, e.g. number of injuries
• Ordinal, e.g. competitive level (values are
numbers/names)
• Nominal, e.g. gender (values are names)
What is data warehouse?
data warehousing is subject-oriented, integrated, timevariant, and non-volatile collection of data in support
of management’s decision-making process.
a data warehouse is data management and data
analysis
data webhouse is a distributed data warehouse that is
implement over the web with no central data
repository
goal: is to integrate enterprise wide corporate data
into a single reository from which users can easily run
queries.
Key Features Of Data Warehousing:
Subject-oriented
Integrated
Time-variant
Nonvolatile
Data warehouse models
Enterprise warehouse:
Collects all of the information about subjects spanning the
entire organization.
Data mart:
Are usually implemented on low-cost departmental
servers that are UNIX or windows/NT –based.
Virtual warehouse:
i) It is a set of views over operational databases.
ii) It is easy to build but requires excess capacity on
operational database servers.
What is data mining?
Data mining is the process of extracting patterns from
data. Data mining is becoming an increasingly important tool to
transform this data into information. It is commonly used in a
wide range of profiling practices, such as marketing,
surveillance, fraud detection and scientific discovery.
Architecture of data mining
Above figure shows the simple architecture of data
mining. It consist of following steps:
Data cleaning
Data integration
Data selection
Data transformation
Data mining
Pattern evaluation
Knowledge discovery
How does data mining work?
Classes: Stored data is used to locate data in predetermined groups.
For example, a restaurant chain could mine customer purchase data to
determine when customers visit and what they typically order.
Clusters: Data items are grouped according to logical relationships
or consumer preferences. For example, data can be mined to identify
market segments or consumer affinities.
Associations: Data can be mined to identify associations. The beer-
diaper example is an example of associative mining
Sequential patterns: Data is mined to anticipate behaviour
patterns and trends. For example, an outdoor equipment retailer could
predict the likelihood of a backpack being purchased based on a
consumer's purchase of sleeping bags and hiking shoes.
What kind of Data can be mined?
Flat files: Flat files are simple data files in text or binary
format with a structure known by the data mining algorithm to
be applied.
Relational Databases: A relational database consists of a set
of tables containing either values of entity attributes, or values
of attributes from entity relationships. Tables have columns and
rows, where columns represent attributes and rows represent
tuples
Multimedia Databases: Multimedia databases include
video, images, audio and text media. They can be stored on
extended object-relational or object-oriented databases, or
simply on a file system.
Data mining technologies
OLAP : Data warehouse systems serve users or knowledge
workers in the role of data analysis and decision-making. Such
systems can organize and present data in various formats in order
to accommodate the diverse needs of the different users. These
systems are called on-line analytical processing (OLAP) systems.
OLTP :The job of earlier on-line operational systems was to
perform transaction and query processing. So, they are also
termed as on-line transaction processing systems (OLTP).
Difference between OLTP and OLAP
Users and system orientation
Data contents
Database design
View
Access patterns
Features of OLAP
Multidimensional views of data:
i) It provides the foundation for analytical processing through
flexible access to information.
ii) It must be able to analyze data across any dimensions at any
level of aggregation, with equal functionality and ease.
Calculation-intensive capabilities:
i) Real test of an OLAP application is its ability to perform
complex calculations; they must be able to do more than simple
aggregation.
ii) Analytical processing systems are judged on their ability to
create information from data.
Time Intelligence: True OLAP systems should understand
the sequential nature of time.
Advantages of Data mining
Marking/Retailing
Banking/Crediting
Law enforcement
Researchers
Disadvantage of Data mining
Security issues
Misuse of information
Conclusion
Data mining is a synonym for knowledge discovery. There is
much work to done in the area of knowledge discovery and data
mining, and its future depends on developing tools and techniques
that yield useful knowledge without causing undue threats to
individuals’ privacy.
References
Advances and research directions in data warehousing
technology by Mukesh Mohania, Sunil Samtani, John F.
Roddick, Yahiko Kambayashi [[email protected]][researchpaper1]
www.wikipedia.com
Future trends in data mining by Hans-Peter Kriegel ·
Karsten M. Borgwardt ·Peer Kröger · Alexey Pryakhin ·
Matthias Schubert · Arthur Zimek.
Book of data mining by jiwei han and micheline
kamber[2006],Elservier inc
www.springerlink.com
THANK YOU………….