Transcript PPT

DATA MINING IN
APPLIED WORLD
Submitted by:
Mr. AMIT V. MALWADE
FINAL YEAR IT
Guided by:
Prof .Mr. NITIN R. CHOPDE
Dept of IT,
CBSCEM AMRAVATI.
WHAT IS DATA?
Data are a bunch of values of one or more variables.
A variable is something that has different values.
Values can be numbers or names, depending on the
variable.
• Numeric, e.g. weight
• Counting, e.g. number of injuries
• Ordinal, e.g. competitive level (values are
numbers/names)
• Nominal, e.g. gender (values are names)
What is data warehouse?
data warehousing is subject-oriented, integrated, timevariant, and non-volatile collection of data in support
of management’s decision-making process.
a data warehouse is data management and data
analysis
data webhouse is a distributed data warehouse that is
implement over the web with no central data
repository
goal: is to integrate enterprise wide corporate data
into a single reository from which users can easily run
queries.
Key Features Of Data Warehousing:
 Subject-oriented
 Integrated
 Time-variant
 Nonvolatile
Data warehouse models
 Enterprise warehouse:
Collects all of the information about subjects spanning the
entire organization.
Data mart:
Are usually implemented on low-cost departmental
servers that are UNIX or windows/NT –based.

 Virtual warehouse:
i) It is a set of views over operational databases.
ii) It is easy to build but requires excess capacity on
operational database servers.
What is data mining?
Data mining is the process of extracting patterns from
data. Data mining is becoming an increasingly important tool to
transform this data into information. It is commonly used in a
wide range of profiling practices, such as marketing,
surveillance, fraud detection and scientific discovery.
Architecture of data mining
Above figure shows the simple architecture of data
mining. It consist of following steps:
Data cleaning
Data integration
Data selection
Data transformation
Data mining
Pattern evaluation
Knowledge discovery
How does data mining work?

Classes: Stored data is used to locate data in predetermined groups.
For example, a restaurant chain could mine customer purchase data to
determine when customers visit and what they typically order.
Clusters: Data items are grouped according to logical relationships
or consumer preferences. For example, data can be mined to identify
market segments or consumer affinities.
Associations: Data can be mined to identify associations. The beer-
diaper example is an example of associative mining
Sequential patterns: Data is mined to anticipate behaviour
patterns and trends. For example, an outdoor equipment retailer could
predict the likelihood of a backpack being purchased based on a
consumer's purchase of sleeping bags and hiking shoes.
What kind of Data can be mined?
Flat files: Flat files are simple data files in text or binary
format with a structure known by the data mining algorithm to
be applied.
Relational Databases: A relational database consists of a set
of tables containing either values of entity attributes, or values
of attributes from entity relationships. Tables have columns and
rows, where columns represent attributes and rows represent
tuples
Multimedia Databases: Multimedia databases include
video, images, audio and text media. They can be stored on
extended object-relational or object-oriented databases, or
simply on a file system.
Data mining technologies
OLAP : Data warehouse systems serve users or knowledge
workers in the role of data analysis and decision-making. Such
systems can organize and present data in various formats in order
to accommodate the diverse needs of the different users. These
systems are called on-line analytical processing (OLAP) systems.
OLTP :The job of earlier on-line operational systems was to
perform transaction and query processing. So, they are also
termed as on-line transaction processing systems (OLTP).
Difference between OLTP and OLAP
Users and system orientation
Data contents
Database design
View
Access patterns
Features of OLAP
 Multidimensional views of data:
i) It provides the foundation for analytical processing through
flexible access to information.
ii) It must be able to analyze data across any dimensions at any
level of aggregation, with equal functionality and ease.
Calculation-intensive capabilities:
i) Real test of an OLAP application is its ability to perform
complex calculations; they must be able to do more than simple
aggregation.
ii) Analytical processing systems are judged on their ability to
create information from data.
Time Intelligence: True OLAP systems should understand
the sequential nature of time.
Advantages of Data mining
Marking/Retailing
Banking/Crediting
Law enforcement
Researchers
Disadvantage of Data mining
Security issues
Misuse of information
Conclusion
Data mining is a synonym for knowledge discovery. There is
much work to done in the area of knowledge discovery and data
mining, and its future depends on developing tools and techniques
that yield useful knowledge without causing undue threats to
individuals’ privacy.
References
Advances and research directions in data warehousing
technology by Mukesh Mohania, Sunil Samtani, John F.
Roddick, Yahiko Kambayashi [[email protected]][researchpaper1]
www.wikipedia.com
Future trends in data mining by Hans-Peter Kriegel ·
Karsten M. Borgwardt ·Peer Kröger · Alexey Pryakhin ·
Matthias Schubert · Arthur Zimek.
Book of data mining by jiwei han and micheline
kamber[2006],Elservier inc
www.springerlink.com
THANK YOU………….