Transcript Chapter 9
Using Management Information Systems
David Kroenke
Business Intelligence and Knowledge Management
Chapter 9
© 2007 Prentice Hall, Inc.
1
Learning Objectives
Understand the need for business intelligence
systems.
Know the characteristics of reporting systems.
Know the purpose and role of data warehouses
and data marts.
Understand fundamental data-mining techniques.
Know the purpose, features, and functions of
knowledge management systems.
© 2007 Prentice Hall, Inc.
2
The Need for Business Intelligence Systems
According to a study done at the University of
California at Berkeley, a total of 403 petabytes of
new data were created.
403 petabytes is roughly the amount of all printed
material ever written.
The printed collection of the Library of Congress is .01
petabytes.
400 petabytes equals 40,000 copies of the print
collection of the Library of Congress.
© 2007 Prentice Hall, Inc.
3
The Need for Business Intelligence Systems
(Continued)
The generation of all these data has much to do
with Moore’s Law.
The capacity of storage devices increases as their
costs decrease.
Today, storage capacity is nearly unlimited.
We are drowning in data and starving for
information.
© 2007 Prentice Hall, Inc.
4
Figure 9-1 How big is an Exabyte?
Source: Used with permission of Peter Lyman and Hal R. Varian, University of California at Berkeley.
© 2007 Prentice Hall, Inc.
5
Figure 9-2 Hard-Disk Storage Capacity
Source: Used with permission of Peter Lyman and Hal R. Varian, University of California at Berkeley.
© 2007 Prentice Hall, Inc.
6
Business Intelligence Tools
Tools for searching business data in an attempt to
find patterns is called business intelligence (BI)
tools.
Reporting tools are programs that read data from
a variety of sources, process that data, produce
formatted reports, and deliver those reports to the
users who need them.
© 2007 Prentice Hall, Inc.
7
Business Intelligence Tools
The processing of data is simple:
Data are sorted and grouped.
Simple totals and averages are calculated.
Reporting tools are used primarily for assessment
They are used to address questions like:
What has happened in the past?
What is the current situation?
How does the current situation compare to the past?
© 2007 Prentice Hall, Inc.
8
Business Intelligence Tools (Continued)
Data-mining tools process data using statistical
techniques, many of which are sophisticated and
mathematically complex.
Data mining involves searching for patterns and
relationships among data.
In most cases, data-mining tools are used to make
predictions.
For example, we can use one form of analysis to
compute the probability that a customer will default
on a loan.
© 2007 Prentice Hall, Inc.
9
Business Intelligence Tools (Continued)
Another way to distinguish the differences of
reporting tools and data-mining tools is :
Reporting tools use simple operations like sorting,
grouping, and summing.
Data-mining tools use sophisticated techniques.
© 2007 Prentice Hall, Inc.
10
Business Intelligence Systems
An information system is a collection of hardware,
software, data, procedures, and people.
The purpose of a business intelligence (BI)
system is to provide the right information, to the
right user, at the right time.
BI systems help users accomplish their goals and
objectives by producing insights that lead to
actions.
© 2007 Prentice Hall, Inc.
11
Business Intelligence Systems (Continued)
A reporting tool can generate a report that shows a
customer has canceled an important order.
A reporting system, however, alerts that
customer’s salesperson with this unwanted news,
and does so in time for the salesperson to try to
alter the customer’s decision.
A data-mining tool can create an equation that
computes the probability that a customer will
default on a loan.
© 2007 Prentice Hall, Inc.
12
Business Intelligence Systems (Continued)
A data-mining system uses that equation to enable
banking personnel to assess new loan
applications.
© 2007 Prentice Hall, Inc.
13
Reporting Systems
The purpose of a reporting system is to create
meaningful information from disparate data
sources and to deliver that information to the
proper user on a timely basis.
Reporting systems generate information from data
as a result of four operations:
Filtering data
Sorting data
Grouping data
Making simple calculations on the data
© 2007 Prentice Hall, Inc.
14
Figure 9-3 Trade Data for NDX.X (NASDAQ 100)
© 2007 Prentice Hall, Inc.
15
Figure 9-4 Report Based on Trade Data in Figure 9-3
© 2007 Prentice Hall, Inc.
16
Components of Reporting Systems
A reporting system maintains a database of
reporting metadata.
The metadata describes the reports, users,
groups, roles, events, and other entities involved
in the reporting activity.
The reporting system uses the metadata to
prepare and deliver reports to the proper users on
a timely basis.
© 2007 Prentice Hall, Inc.
17
Figure 9-5 Components of a Reporting System
© 2007 Prentice Hall, Inc.
18
Figure 9-6 Summary of Report Characteristics
© 2007 Prentice Hall, Inc.
19
Report Type
In terms of a report type, reports can be static or
dynamic.
Static reports are prepared once from the
underlying data, and they do not change.
Example, a report of past year’s sales
Dynamic reports: the reporting system reads the
most current data and generates the report using
that fresh data.
Examples are: a report on sales today and a report on
current stock prices
© 2007 Prentice Hall, Inc.
20
Report Type (Continued)
Query reports are prepared in response to data
entered by users.
Online analytical processing (OLAP) reports allow
the user to dynamically change the report
grouping structures.
© 2007 Prentice Hall, Inc.
21
Report Media
Reports are delivered via many different report
media or channels.
Some reports are printed on paper, and others are
created in a format like PDF whereby they can be
printed or viewed electronically.
Other reports are delivered to computer screens.
Companies sometimes place reports on internal
corporate Web sites for employees to access.
© 2007 Prentice Hall, Inc.
22
Report Media (Continued)
Another report medium is a digital dashboard,
which is an electronic display customized for a
particular user.
Vendors like Yahoo! and MSN provide common
examples.
Users of these services can define content they wantsay, a local weather forecast, a list of stock prices, or a
list of news sources.
The vendor constructs the display customized for each
user.
© 2007 Prentice Hall, Inc.
23
Report Media (Continued)
Other dashboards are particular to an organization.
The organization might have a dashboard that shows upto-the-minute production and sales activities.
Alerts are another form of report.
Users can declare that they wish to receive notifications of
events, say, via email or on their cell phones.
Reports can be published via a Web service.
The Web service produces the report in response to
requests from the service-consuming application.
© 2007 Prentice Hall, Inc.
24
Figure 9-7 Digital Dashboard Example
© 2007 Prentice Hall, Inc.
25
Report Mode
The report mode can be either push report or
pull report.
Organizations send a push report to users
according to a preset schedule.
Users receive the report without any activity on their
part.
Users must request a pull report.
To obtain a pull report, a user goes to a Web portal or
digital dashboard and clicks a link or button to cause the
reporting system to produce and deliver the report.
© 2007 Prentice Hall, Inc.
26
Functions of Reporting Systems
Three functions of reporting systems are:
Authoring
Management
Delivery
Report authoring involves connecting to data
sources, creating the reporting structure, and
formatting the report.
© 2007 Prentice Hall, Inc.
27
Figure 9-8 Connecting to a Report Data Source
Using VisualStudio.Net
Source: Microsoft product screen shot reprinted with permission from Microsoft Corporation.
© 2007 Prentice Hall, Inc.
28
Figure 9-9 Formatting a Report Using
VisualStudio.Net
Source: Microsoft product screen shot reprinted with permission from Microsoft Corporation.
© 2007 Prentice Hall, Inc.
29
Report Management
The purpose of report management is to define
who receives what reports, when, and by what
means.
Most report-management systems allow the report
administrator to define user accounts and user
groups and to assign particular users to particular
groups.
Reports that have been created using the reportauthoring system are assigned groups and users.
© 2007 Prentice Hall, Inc.
30
Report Management (Continued)
Assigning reports to groups saves the
administrator work.
When a report is created, changed, or removed, the
administrator need only change the report assignments
to the group.
All of the users in the group will inherit the changes.
Metadata also indicates what channel is to be
used and whether the report is to be pushed or
pulled.
If the report is to be pushed, the administrator declares
whether the report is to be generated on a regular
schedule or as an alert.
© 2007 Prentice Hall, Inc.
31
Report Delivery
The report-delivery function of a reporting system
pushes reports or allows them to be pulled
according to report-management metadata.
Reports can be delivered via an email server, Web
site, XML Web services, or by other programspecific means.
The report-delivery system uses the operating
system and other program security components to
ensure that only authorized users receive
authorized reports.
© 2007 Prentice Hall, Inc.
32
Report Delivery (Continued)
The report-delivery system also ensures that push
reports are produced at appropriate times.
For query reports, the report-delivery system
serves as an intermediary between the user and
the report generator.
It receives user query data, such as item numbers in an
inventory query, passes the query data to the report
generator, receives the resulting report, and delivers the
report to the user.
© 2007 Prentice Hall, Inc.
33
RFM Analysis
RFM analysis is a way of analyzing and ranking
customers according to their purchasing patterns.
It is a simple technique that considers how
recently (R) a customer has ordered, how
frequently (F) a customer orders, and how much
money (M) the customer spends per order.
To produce an RFM score, the program first sorts
customer purchase records by the date of their
most recent (R) purchase.
© 2007 Prentice Hall, Inc.
34
RFM Analysis (Continued)
In a common form of this analysis, the program
then divides the customers into five groups and
gives customers in each group a score of 1 to 5.
The top 20% of the customers having the most recent
orders are given an R score 1 (highest).
The program then re-sorts the customers on the
basis of how frequently they order.
The top 20% of the customers who order most
frequently are given a F score of 1 (highest).
Finally the program sorts the customers again
according to the amount spent on their orders.
The 20% who have ordered the most expensive items
are given an M score of 1 (highest).
© 2007 Prentice Hall, Inc.
35
RFM Analysis (Continued)
A reporting system can generate the RFM data
and deliver it in many ways:
A report of RFM scores for all customers can be pushed
to the vice president of sales.
Reports with scores for particular regions can be
pushed to regional sales managers.
Reports of scores for particular accounts can be pushed
to the account salespeople.
All of this reporting can be automated.
© 2007 Prentice Hall, Inc.
36
Figure 9-10 Example of RFM Score Data
© 2007 Prentice Hall, Inc.
37
Online Analytical Processing
Online analytical processing (OLAP) provides
the ability to sum, count, average, and perform
other simple arithmetic operations on groups of
data.
The remarkable characteristics of OLAP reports is
that they are dynamic.
The viewer of the report can change the report’s
format, hence, the term online.
© 2007 Prentice Hall, Inc.
38
Online Analytical Processing
An OLAP report has measures and dimensions.
A measure is the data item of interest.
It is the item that is to be summed or averaged
or otherwise processed in the OLAP report.
A dimension is a characteristic of a
measure.
Purchase data, customer type, customer
location, and sales region are all examples of
dimension.
© 2007 Prentice Hall, Inc.
39
Online Analytical Processing (Continued)
With an OLAP report, it is possible to drill down
into the data.
This term means to further divide the data into more
detail.
Special-purpose products called OLAP servers
have been developed to perform OLAP analysis.
An OLAP server reads data from an operational
database, performs preliminary calculations, and
stores the results of those operations in an OLAP
database.
© 2007 Prentice Hall, Inc.
40
Figure 9-11 OLAP Product Family by Store Type
© 2007 Prentice Hall, Inc.
41
Figure 9-12 OLAP Product Family and Store
Location by Store Type
© 2007 Prentice Hall, Inc.
42
Figure 9-13 OLAP Family and Store Location by
Store Type
© 2007 Prentice Hall, Inc.
43
Figure 9-14 Role of OLAP Server and OLAP
Database
© 2007 Prentice Hall, Inc.
44
Data Warehouses and Data Marts
Basic reports and simple OLAP analyses can be
made directly from operational data.
For the most part, such reports display the current
state of the business; and if there are a few
missing values or small inconsistencies with the
data, no one is too concerned.
Operational data are unsuited to more
sophisticated analyses, particularly, data-mining
analyses that require high-quality input for
accurate and useful results.
© 2007 Prentice Hall, Inc.
45
Data Warehouses and Data Marts (Continued)
Many organizations choose to extract operational
data into facilities called data warehouses and
data marts, both of which are facilities that
prepare, store, and manage data specifically for
data mining and other analyses.
Programs read operational data and extract,
clean, and prepare that data for BI processing.
The prepared data are stored in a data-warehouse
database using data-warehouse DBMS, which can
be different from the organization’s operational
DBMS.
© 2007 Prentice Hall, Inc.
46
Data Warehouses and Data Marts (Continued)
Data warehouses include data that are purchased
from outside sources.
Metadata concerning the data, its source, its
format, its assumptions and constraints, and other
facts about the data is kept in a data-warehouse
metadata database.
The data-warehouse DBMS extracts and provides
data to business intelligence tools such as datamining programs.
© 2007 Prentice Hall, Inc.
47
Figure 9-15 Components of a Data Warehouse
© 2007 Prentice Hall, Inc.
48
Figure 9-16 Consumer Data Available for
Purchase from Data Vendors
© 2007 Prentice Hall, Inc.
49
Problems with Operational Data
Most operational and purchased data have
problems that inhibit their usefulness for business
intelligence.
Problematic data are termed dirty data.
Examples are values of B for customer gender and of
213 for customer age.
Purchased data often contain missing elements.
Most data vendors state the percentage of missing
values for each attribute in the data they sell.
An organization buys such data because for some uses,
some data is better than no data at all.
© 2007 Prentice Hall, Inc.
50
Problems with Operational Data (Continued)
Inconsistent data are particularly common for data
that have been gathered over time.
When an area code changes, for example, the phone
number for a given customer before the change will not
match the customer’s number after the change.
Some data inconsistencies occur from the nature
of the business activity.
Nonintegrated data can cause problems when
data comes from different management
information systems.
© 2007 Prentice Hall, Inc.
51
Problems with Operational Data (Continued)
Data can be too fine or too coarse.
It is possible to capture the customers clicking behavior
in what is termed clickstream data that includes
everything a customer does at a Web site.
If data is in the wrong format, that condition is
sometimes expressed by saying the data have the
wrong granularity.
Because of a phenomenon called the curse of
dimensionally, the more attributes there are, the
easier it is to build a model that fits the sample
data but that is worthless as a predictor.
© 2007 Prentice Hall, Inc.
52
Figure 9-17 Problems of Using Transaction Data
for Analysis and Data Mining
© 2007 Prentice Hall, Inc.
53
Data Warehouses Versus Data Marts
The data warehouse takes data from the data
manufacturers (operational systems and
purchased data), cleans and processes the data,
and locates the data on the shelves, so to speak,
of the data warehouse.
A data mart is a data collection, smaller than the
data warehouse, that addresses a particular
component or functional area of the business.
© 2007 Prentice Hall, Inc.
54
Data Warehouse Versus Data Marts (Continued)
The data warehouse is like the distributor in the
supply chain and the data mart is like the retail
store in the supply chain.
Users in the data mart obtain data that pertain to a
particular business function from the data
warehouse.
It is expensive to create, staff, and operate data
warehouses and data marts.
© 2007 Prentice Hall, Inc.
55
Figure 9-18 Data Mart Examples
© 2007 Prentice Hall, Inc.
56
Data Mining
Data mining is the application of statistical
techniques to find patterns and relationships
among data and to classify and predict.
Data mining represents a convergence of
disciplines.
Data-mining techniques emerged from statistics
and mathematics and from artificial intelligence
and machine-learning fields in computer science.
© 2007 Prentice Hall, Inc.
57
Figure 9-19 Convergence Disciplines for Data Mining
© 2007 Prentice Hall, Inc.
58
Unsupervised Data Mining
With unsupervised data mining, analysts do not
create a model or hypothesis before running the
analysis.
Instead, they apply the data-mining technique to
the data and observe the results.
Analysts create hypotheses after the analysis to
explain the patterns found.
© 2007 Prentice Hall, Inc.
59
Unsupervised Data Mining (Continued)
One common unsupervised technique is cluster
analysis.
A common use for cluster analysis is to find groups of
similar customers from customer order and
demographic data.
© 2007 Prentice Hall, Inc.
60
Supervised Data Mining
With supervised data mining, data miners
develop a model prior to the analysis and apply
statistical techniques to data to estimate
parameters of the model.
One such analysis, which measures the impact of
a set of variables on another variable, is called a
regression analysis.
Neural networks are another popular supervised
data-mining technique used to predict values and
make classifications such as “good prospect” or
“poor prospect” customers.
© 2007 Prentice Hall, Inc.
61
Market-Basket Analysis
A market-basket analysis is a data-mining
technique for determining sales patterns.
A market-basket analysis shows the products that
customers tend to buy together.
In market-basket terminology, support is the
probability that two items will be purchased
together.
You can expect market-basket analysis to become
a standard CRM analysis during your career.
© 2007 Prentice Hall, Inc.
62
Figure 9-20 Market-Basket Example
© 2007 Prentice Hall, Inc.
63
Decision Trees
A decision tree is a hierarchical arrangement of
criteria that predict a classification or a value.
Decision tree analyses are an unsupervised datamining technique.
The analyst sets up the computer program and
provides the data to analyze, and the decision tree
program produces the tree.
© 2007 Prentice Hall, Inc.
64
Figure 9-21 Grades of Students from Past MIS Class
(Hypothetical Data)
© 2007 Prentice Hall, Inc.
65
A Decision Tree for Loan Evaluation
A common business application of decision trees
is to classify loans by likelihood of default.
Organizations analyze data from past loans to
produce a decision tree that can be converted to
loan-decision rules.
A financial institution could use such a tree to assess
the default risk on a new loan.
© 2007 Prentice Hall, Inc.
66
Figure 9-22 Credit Score Decision Tree
Source: Used with permission of Insightful Corporation. Copyright © 1999-2005 Insightful Corporation. All Rights Reserved.
© 2007 Prentice Hall, Inc.
67
Knowledge Management
Knowledge management systems concern the
sharing of knowledge that is already known to
exist, either in libraries of documents, in the heads
of employees, or in other known sources.
Knowledge management (KM) is the process of
creating value from intellectual capital and sharing
that knowledge with employees, managers,
suppliers, customers, and others who need that
capital.
© 2007 Prentice Hall, Inc.
68
Knowledge Management (Continued)
Knowledge management is a process that is
supported by the five components of an
information system.
Its emphasis is on people, their knowledge, and
effective means for sharing that knowledge with others.
The benefits of KM concern the application of
knowledge to enable employees and others to
leverage organizational knowledge to work
smarter.
KM preserves organizational memory by capturing
and storing the lessons learned and best practices
of key employees.
© 2007 Prentice Hall, Inc.
69
Content Management Systems
Content management systems are information
systems that track organizational documents, Web
pages, graphics, and related materials.
Such systems differ from operational document
systems in that they do not directly support
business operations.
KM content management systems are concerned
with the creation, management, and delivery of
documents that exist for the purpose of imparting
knowledge.
© 2007 Prentice Hall, Inc.
70
Content Management Systems (Continued)
Typical users of content management systems are
companies that sell complicated products and
want to share their knowledge of those products
with employees and customers.
The basic functions of content management
systems are the same as for report management
systems: author, manage, and deliver.
The only requirement that content managers place
on document authoring is that the document has
been created in a standardized format.
© 2007 Prentice Hall, Inc.
71
Content Management Problems
Content management functions are, however,
exceedingly complicated.
Most content databases are huge; some have
thousands of individual documents, pages, and
graphics.
© 2007 Prentice Hall, Inc.
72
Content Management Problems (Continued)
Documents may refer to one another or multiple
documents may refer to the same product or
procedure.
When one of them changes, others must change as well.
Some content management systems keep semantic linkages
among documents so that content dependencies can be
known and used to maintain document consistency.
© 2007 Prentice Hall, Inc.
73
Content Management Problems (Continued)
Document contents are perishable.
Documents become obsolete and need to be altered,
removed, or replaced.
Multinational companies have to ensure document
language translations.
© 2007 Prentice Hall, Inc.
74
Figure 9-23 Document Management at Microsoft.com
(as of December 2003)
Source: microsoft.com/backstage/inside.htm (accessed February 2004). © 2003 Microsoft Corporation. All rights reserved.
© 2007 Prentice Hall, Inc.
75
Figure 9-24 Reporting Services: United States
Source: Used with permission of Tom Rizzo of Microsoft Corporation.
© 2007 Prentice Hall, Inc.
76
Figure 9-25 Reporting Services: China
Source: Used with permission of Tom Rizzo of Microsoft Corporation.
© 2007 Prentice Hall, Inc.
77
Content Delivery
Almost all users of content management systems
pull the contents.
Users cannot pull content if they do not know it
exists.
The content must be arranged and indexed, and a facility
for searching the content devised.
Documents that reside behind a corporate firewall,
however, are not publicly accessible and will not be
reachable by Google or other search engines.
Organizations must index their own proprietary
documents and provide their own search capability for
them.
© 2007 Prentice Hall, Inc.
78
Content Delivery (Continued)
Web browsers and other programs can readily
format content expressed in HTML, PDF, or another
standard format.
XML documents often contain their own formatting
rules that browsers can interpret.
The content management system will have to determine
an appropriate format for content expressed in other
ways.
© 2007 Prentice Hall, Inc.
79
KM Systems to Facilitate the Sharing of Human
Knowledge
Nothing is more frustrating for a manager to
contemplate than the situation in which one employee
struggles with a problem that another employee knows
how to solve easily.
KM systems are concerned with the sharing not only
of content, but also with the sharing of knowledge
among humans.
How can one person share her knowledge with another?
How can one person learn of another person’s great idea?
© 2007 Prentice Hall, Inc.
80
KM Systems to Facilitate the Sharing of Human
Knowledge (Continued)
Three forms of technology are used for knowledgesharing among humans:
Portals, discussion groups, and email
Collaborations systems
Expert systems
Portals
Employees can share ideas by posting knowledge on a Web
portal whereby managers and employees can pull the
knowledge from the portal.
© 2007 Prentice Hall, Inc.
81
Figure 9-26 Technology Support of Sharing Human
Knowledge
© 2007 Prentice Hall, Inc.
82
KM Systems to Facilitate the Sharing of Human
Knowledge (Continued)
Discussion Groups
Discussion groups allow employees or customers to post
questions and queries seeking solutions to problems they
have.
Oracle, IBM, PeopleSoft, and other vendors support product
discussion groups where users can post questions and
where employees, vendors, and other users can answer
them.
Later, the organization can edit and summarize the
questions from such discussion groups into frequently
asked questions (FAQs).
© 2007 Prentice Hall, Inc.
83
KM Systems to Facilitate the Sharing of Human
Knowledge (Continued)
Discussion groups (continued)
Basic email can also be used for knowledge-sharing,
especially if email lists have been constructed with KM in
mind.
Two human factors inhibit knowledge-sharing.
Employees can be reluctant to exhibit their ignorance.
Competition exists between employees.
A KM application may be ill-suited to a competitive group.
The company may be able to restructure rewards and
incentives to foster sharing of ideas among employees.
© 2007 Prentice Hall, Inc.
84
KM Systems to Facilitate the Sharing of Human
Knowledge (Continued)
Collaboration Systems
Collaboration systems are information systems that enable
people to work together more effectively.
The Internet can be used as a broadcast medium for
speeches, panel discussion, and other types of meetings.
Web broadcasts, because they are digital, can be readily
saved and replayed at the viewer’s convenience.
Web broadcasts can also be made interactive by combining
them with discussion group bulletin boards that are live
during the broadcast.
Video conferencing is another popular form of IT-supported
meetings.
Video-conferencing equipment is expensive and normally
is located in selected sites in the organization.
© 2007 Prentice Hall, Inc.
85
KM Systems to Facilitate the Sharing of Human
Knowledge (Continued)
Collaboration Systems (continued)
Net meetings are a means by which individuals can
participate in remote meetings without leaving their desk.
With a speaker and a Web camera, virtual meetings can
be conducted among employees who sit in their own
offices.
© 2007 Prentice Hall, Inc.
86
Figure 9-27 Net Meeting Graphic
© 2007 Prentice Hall, Inc.
87
KM Systems to Facilitate the Sharing of Human
Knowledge (Continued)
Expert Systems
Expert systems are created by interviewing experts in a
given business domain and codifying the rules stated by
those experts.
Many expert systems were created in the late 1980s and
1990s, and some of them have been successful.
Expert systems suffer from three major disadvantages.
They are difficult and expensive to develop.
They are difficult to maintain.
They were unable to live up to the high expectations set
by their name.
© 2007 Prentice Hall, Inc.
88
Summary
Enormous amounts of data are generated each
year.
Business intelligence (BI) tools search these
increasing amounts of data for useful information.
Reporting tools tend to be used for assessment,
process data using simple calculations such as
sums and averages.
© 2007 Prentice Hall, Inc.
89
Summary (Continued)
Data-mining tools, tend to be used for prediction,
process data using sophisticated statistical and
mathematical techniques.
Reporting systems create meaningful information
from disparate data sources and deliver that
information to the proper user on a timely basis.
RFM and OLAP are two examples of report
applications.
© 2007 Prentice Hall, Inc.
90
Summary (Continued)
Data warehouses and data marts are facilities that
prepare, store, and manage data for data mining
and other analyses.
Data Market-basket analysis determines groups of
products that customers tend to purchase
together.
Decision trees are used to construct “If…Then…”
rules for predicting classifications.
© 2007 Prentice Hall, Inc.
91
Summary (Continued)
Knowledge management is the process of creating
value from intellectual capital and sharing that
knowledge with employees, managers, suppliers,
customers, and others who need that capital.
Human knowledge-sharing systems use portals,
bulletin boards, and email to facilitate knowledge
interchange.
Collaboration systems include net conferencing,
video conferencing, and expert systems.
© 2007 Prentice Hall, Inc.
92
Key Terms and Concepts
Business intelligence (BI)
systems
Business intelligence (BI) tools
Clickstream data
Cluster analysis
Collaboration systems
Confidence
Content management systems
Curse of dimensionality
Data mart
Data mining
Data-mining tools
Data warehouse
Decision trees
Digital dashboard
Dimension
Dirty data
Discussion groups
Drill down
Dynamic report
Exabyte
Expert Systems
Frequently asked
questions (FAQs)
© 2007 Prentice Hall, Inc.
93
Key Terms and Concepts (Continued)
Granularity
If…then…rules
Knowledge management
(KM)
Lift
Market-basket analysis
Measure
Neural networks
OLAP cube
OLAP server
Online analytical processing
(OLAP)
Petabyte
Portals
Pull report
Push report
Query report
Regression analysis
Report media
Report mode
Report type
Reporting systems
Reporting tools
RFM analysis
Semantic security
© 2007 Prentice Hall, Inc.
94
Key Terms and Concepts (Continued)
Static report
Supervised data mining
Support
Unsupervised data mining
© 2007 Prentice Hall, Inc.
95
Security Guide–Semantic Security
Security is a very difficult problem, and it gets
worse every year.
Physical security is hard enough: How do we know
that the person (or program) that signs on as
Megan Cho is really Megan Cho?
We use passwords, but files of passwords can be
stolen.
Suppose Megan works in the HR department, so
she has access to personal and private data of
other employees.
© 2007 Prentice Hall, Inc.
96
Security Guide – Semantic Security (Continued)
We need to design the reporting system so that
Megan can access all of the data she needs to do
her job, and no more.
A reporting server is an obvious and juicy target
for any would-be intruder.
Someone can break in and change access permissions.
Or, a hacker could pose as someone else to obtain
reports.
© 2007 Prentice Hall, Inc.
97
Security Guide–Semantic Security (Continued)
Semantic security concerns the unintended
release of a combination of reports or documents
that are independently not protected.
Megan was given just two reports to do her job
Yet she combined the information in those reports with
publicly available information and is able to deduce
salaries, for at least some employees.
These salaries are much more than she is supposed to
know.
This is a semantic security problem.
© 2007 Prentice Hall, Inc.
98
Problem Solving Guide–Counting and Counting and
Counting
The product managers wanted the data miners to
analyze customer clicks on a Web page to
determine customer preferences for particular
product lines.
The products were competing with one another for
resources.
“Sampling?” asked the product managers in a chorus
“Sampling? No way. We want all the data. This is
important, and we don’t want a guess.”
© 2007 Prentice Hall, Inc.
99
Problem Solving Guide–Counting and Counting and
Counting (Continued)
There’s nothing wrong with sampling
Properly done, the results from a sample are just as
accurate as results from the complete data set.
Studies done from samples are also cheaper and faster.
Sampling is a great way to save time and money.
In truth, skill is required to develop a good sample.
The product managers should have listened to the data
miners’ sampling plan and ensured that the sample
would be appropriate, given the goals of the study.
Understanding this concept will save you and your
organization substantial money!
© 2007 Prentice Hall, Inc.
100
Ethics Guide–The Ethics of Classification
Classification is a useful human skill.
Sorting and classifying are necessary, important,
and essential activities.
But those activities can also be dangerous
Serious ethical issues arise when we classify
people.
What makes someone a good or bad “prospect”?
If we’re talking about classifying customers in order to
prioritize our sales calls, then the ethical issue may not
be too serious.
What about classifying applicants for college?
© 2007 Prentice Hall, Inc.
101
Opposing Forces–Data Mining in the Real World
I’m not really a contrarian about data mining.
I believe in it.
But data mining in the real world is a lot different from
the way it’s described in textbooks
One problem is that data are always dirty, with missing
values, values way out of the range of possibility, and
time values that make no sense.
“Another problem is that you know the least when
you start the study”.
So you work for a few months and learn that if you had
another variable, say the customer’s zip code, or age, or
something else, you could do a much better analysis.
© 2007 Prentice Hall, Inc.
102
Opposing Forces–Data Mining in the Real World
(Continued)
Overfitting is another problem, a huge one.
With neural networks, you can create a model of any
level of complexity you want, except that none of those
equations will predict new cases with any accuracy at all.
When using neural nets, you have to be very careful not
to overfit the data.
Another problem is seasonality:
Say all your training data are from the summer-will your
model be valid for the winter?
© 2007 Prentice Hall, Inc.
103
Opposing Forces – Data Mining in the Real World
(Continued)
“When you start a data-mining project, you never
know how it will turn out”;
Some were bad and a wasted of time.
Some were good and found to have interesting and
important patterns and information and created very
accurate predictive models.
It’s not easy, though, you have to be very careful
and lucky.
© 2007 Prentice Hall, Inc.
104
Reflection Guide–Justifying the Justification?
Computer simulation of World War III project at
Pentagon 1971-1973
Analysis process
Run the simulation and obtain a set of results.
The military analysts and weapons experts would
examine the results, and if results weren’t quite what
was expected or wanted, the analysts would ask to
change some of the inputs or a portion of the model.
Over time, an accumulated set of results was approved.
The accumulated results were presented to the four-star
generals and other senior Pentagon managers.
Sometimes these senior people would see problems
in the analyses, and gave instruct ions to discard
some of the results.
© 2007 Prentice Hall, Inc.
105
Reflection Guide – Justifying the Justification?
(Continued)
Observation
I do not believe that anyone thought they were deceiving
anyone else.
The top managers didn’t realize that the results they saw
left out a substantial portion of the unfavorable
simulations.
They never knew about the other results.
The analysts who were filtering the outcomes by
throwing out the numbers didn’t like being dishonest
They simply thought that those results were wrong
or unrealistic.
I do not think they realized they were using the
computer to promulgate their prior ideas about
military needs.
© 2007 Prentice Hall, Inc.
106
Reflection Guide–Justifying the Justification?
(Continued)
Questions to think about
Why perform the analysis?
What are you going to do with the results?
What is it that you want to know or to decide?
Answer the questions above before you begin the
analysis.
Then, pay attention to the results.
Don’t argue with the data.
If the results don’t conform to your expectations, think
long and hard about changing the model, adjusting the
data, or modifying the answers.
© 2007 Prentice Hall, Inc.
107