Data mining - Binus Repository
Download
Report
Transcript Data mining - Binus Repository
Chapter 4
Data and Knowledge Management
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
1
Chapter Outline
4.1 Managing Data
4.2 The Database Approach
4.3 Database Management Systems
4.4 Data Warehousing
4.5 Knowledge Management
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
2
Learning Objectives
Recognize the importance of data, issues involved
in managing data and their lifecycle.
Describe the sources of data and explain how data
are collected.
Explain the advantages of the database approach.
Explain the operation of data warehousing and its
role in decision support.
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
3
Learning Objectives (Continued)
Understand the capabilities and benefits of
data mining.
Describe data visualization.
Explain geographic information systems and
virtual reality as decision support tools.
Define knowledge and describe the different
types of knowledge.
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
4
4.1 Managing Data
Difficulties of Managing Data.
Amount of data increases exponentially.
Data are scattered and collected by many
individuals using various methods and devices.
Data come from many sources including internal
sources, personal sources and external sources.
Data security, quality and integrity are critical.
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
5
Managing Data (Continued)
Clickstream data. Data that visitors and
customers produce when they visit a Website.
An ever-increasing amount of data needs to
be considered in making organizational
decisions.
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
6
Data Life Cycle
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
7
Data Hierarchy
Bit (a binary digit): a circuit that is either on
or off.
Byte: group of 8 bits, represents a single
character.
Field: name, number, or characters that
describe an aspect of a business object or
activity.
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
8
Data Hierarchy (Continued)
Record: collection of related data
fields.
File (or table): collection of related
records.
Database: a collection of integrated and
related files.
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
9
Let see fig. 4.2.
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
10
4.2 Database Approach
Database management system (DBMS) provides all users
with access to all the data.
DBMSs minimizes the following problems:
Data redundancy: the same data stored in many places.
Data isolation: applications cannot access data
associated with other applications.
Data inconsistency: various copies of the data do not
agree.
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
11
Database Approach (Continued)
DBMSs maximize the following issues:
Data security.
Data integrity: data meets certain constraints, no
alphabetic characters in zip code field.
Data independence: applications and data are
independent of one another, all applications are
able to access the same data.
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
12
Let see fig. 4.3
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
13
Designing the Database
Data model. Diagram that represents the entities in
the database and their relationships.
Entity is a person, place, thing or event.
Attribute is a characteristic or quality of a particular
entity.
Primary key is a field that uniquely identifies that
record.
Secondary keys are fields that have identifying
information but may not identify with complete accuracy.
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
14
Entity-Relationship Modeling
Database designers plan the database design in a
process called entity-relationship (ER) modeling.
ER diagrams consists of entities, attributes and
relationships.
Entity classes are a group of entities of a given
type, i.e. STUDENT.
Instance is the representation of a particular entity,
i.e. STUDENT(John Smith, 123-45-6789, …).
Identifiers are attributes unique to that entity
instance, i.e. StudentIDNumber.
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
15
Let see fig. 4.4
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
16
4.3 Database Management
Systems
Database management system (DBMS) is a set of
programs that provide users with tools to add,
delete, access and analyze data stored in one
location.
Online transaction processing (OLTP) is when
transactions are processed as soon as they occur.
Relational database model is based on the concept
of two-dimensional tables.
Popular examples of relational databases are
Microsoft Access and Oracle.
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
17
Let see fig. 4.5, 4.6, 4.7 and 4.8
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
18
4.4 Data Warehousing
Data warehouse is a repository of historical
data organized by subject to support decision
makers in the organization and include:
Online analytical processing which involves
the analysis of accumulated data by end users;
Multidimensional data structure which allows
data to be represented in a three-dimensional
matrix (or data cube).
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
19
Let see fig. 4.9, 4.10, 4.11 and 4.12
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
20
Benefits of Data Warehousing
End users can access data quickly and easily
via Web browsers because they are located in
one place.
End users can conduct extensive analysis
with data in ways that may not have been
possible before.
End users have a consolidated view of
organizational data.
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
21
Data Marts & Data Mining
Data mart is a small data warehouse,
designed for the end-user needs in a strategic
business unit (SBU) or a department.
Data mining involves searching for valuable
business information in a large database, data
warehouse, or data mart.
Used to predict trends and behaviors.
Identify previously unknown patterns.
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
22
Data Mining Applications
Retailing and sales. Predict sales, prevent theft and fraud,
determine correct inventory levels and distribution
schedules.
Banking. Forecast levels of bad loans, fraudulent credit card
use, predict credit card spending by new customers, etc.
Manufacturing and production. Predict machinery
failures, find key factors to help optimize manufacturing
capacity.
Insurance. Forecast claim amounts, medical coverage costs,
predict which customers will buy new insurance policies.
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
23
Data Mining Applications
(Continued)
Policework. Track crime patterns, locations,
criminal behavior; identify attributes to assist in
solving criminal cases.
Health care. Correlate demographics of patients
with critical illnesses, develop better insight to
identify and treat symptoms and their causes.
Marketing. Classify customer demographics to
predict how customers will respond to mailing or
buy a particular product.
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
24
4.6 Knowledge Management
Knowledge management (KM) is a process that
helps organizations manipulate important
knowledge that is part of the organization’s
memory, usually in an unstructured format.
Knowledge is information that is contextual,
relevant and actionable; information in action.
Intellectual capital (or intellectual assets) is
another term often used for knowledge.
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
25
Knowledge Management
(Continued)
Explicit knowledge deals with more objective, rational and
technical knowledge.
Tacit knowledge is the cumulative store of subjective or
experiential learning.
Knowledge management systems (KMSs) use modern
information technologies – Internet, intranets, extranets, data
warehouses - to systemize, enhance and expedite intrafirm
and interfirm knowledge management.
Best practices are the most effective and efficient ways of
doing things, readily available to a wide range of employees.
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
26
Let see fig. 4.13
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
27
Knowledge Management System
Cycle
Create knowledge. Determine new ways.
Capture knowledge. Identify as valuable.
Refine knowledge. Make it actionable.
Store knowledge. Store in a reasonable format.
Manage knowledge. Verify it is relevant, accurate.
Disseminate knowledge. Made available.
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
28
THE END OF SESSION 8
Copyright 2007 John
Wiley & Sons, Inc.
Chapter 4
29