Learning Objectives

Download Report

Transcript Learning Objectives

Chapter 11
Data Management:
Warehousing, Analyzing,
Mining & Vizualization
1
Learning Objectives
 Recognize the importance of data, their managerial issues,
and their life cycle.
 Describe the sources of data, their collection, and quality
issues.
 Relate data management to multimedia and document
management.
 Explain the operation of data warehousing and its role in
decision support.
2
Learning Objectives (cont.)
 Understand the data access and analysis problem and the data
mining and online analytical processing solutions.
 Describe data presentation methods and explain geographical
information systems, visual simulations, and virtual reality as
decision support tools.
 Discuss the role and provide examples of marketing databases.
 Recognize the role of the Web in data management.
3
Case: Sears & Data Warehouses
Problem:
 Sears was caught by surprise in the 1980s as shoppers defected to
specialty stores and discount mass merchandisers.
Solution:
 Sears constructed a single sales information data warehouse, replacing
18 old databases which were packed with redundant, conflicting &
obsolete data.
 By 2001, Sears made the following Web initiatives:
 e-Commerce home improvement center
 B2B supply exchange for the retail industry
 Online Toy catalog and much more
4
Case: Sears & Data Warehouses
Results:
 The ability to monitor sales by item per store enables Sears to create a
sharp local market focus.
 Data monitoring of Web-based sales helps Sears marketing and Web
advertisement plans.
 Response time to queries has dropped from days to minutes.
 The data warehouse offers Sears employees a tool for making better
decisions.
 Sears retailing profits have climbed more than 20 % annually since the
data warehouse was implemented.
5
Difficulties of Managing Data
 The amount of data increases exponentially.
 Data are scattered throughout organizations and are collected by many
individuals using several methods and devices.
 Only small portions of an organization’s data are relevant for any specific
decision.
 An ever-increasing amount of external data needs to be considered in
making organizational decisions.
 Data are frequently stored in several servers and locations in an
organization.
6
Difficulties of Managing Data (cont.)
 Raw data may be stored in different computing systems, databases,
formats, and human and computer languages.
 Legal requirements relating to data differ among countries and
change frequently.
 Selecting data management tools can be a major problem because of
the huge number of products available.
 Data security, quality, and integrity are critical yet are easily
jeopardized.
7
Data Life Cycle
8
Data Sources & Collection
Internal Data. An organization’s internal data are about people, products,
services, and processes.
Personal Data. IS users or other corporate employees may document their
own expertise by creating personal data.
External Data. There are many sources for external data, ranging from
commercial databases to sensors and satellites.
The Internet & Commercial Database Services. Some external data flow to
an organization through electronic data interchange (EDI), through other
company-to-company channels or the Internet.
9
Data Quality (DQ)
DQ is an extremely important
issue since quality
determines the data’s
usefulness as well as the
quality of the decisions
based on the data.
10
Data Quality Problems
(Strong et al.,1997)
Intrinsic DQ: Accuracy,
objectivity, believability, and
reputation.
Accessibility DQ:
Accessibility and access
security.
Contextual DQ: Relevancy,
value added, timeliness,
completeness, amount of
data.
Representation DQ:
Interpretability, ease of
understanding, concise
representation, consistent
representation.
11
Object-Oriented Databases
 The object-oriented database is the most widely used of the
newest methods of data organization, especially for Web
applications.
 An object-oriented database is a part of the object-oriented
paradigm, which also includes object-oriented programming,
operating systems, and modeling.
 Object-oriented databases are sometimes referred to as
multimedia databases and are managed by special multimedia
database management systems.
12
Document Management
Document Management is the automated control of electronic
documents, page images, spreadsheets, word processing
documents, and complex, compound documents through their
entire life cycle within an organization, from initial creation to final
archiving.
Benefits of Document Management :
 Greater control over production, storage, and distribution of documents
 Greater efficiency in the reuse of information
 Control of a document through a workflow process
 Reduction of product cycle times
13
Case: U.S. Automobile Association (USAA)
Problem:
 The USAA is a large insurance company in Texas that serves over 2
million officers. In the 1980s, the company experienced extreme
delays in data retrieval and searches.
Solution:
 Using an environment called Automated Insurance Environment,
USAA has been transformed into a completely paperless company.
Results:
 The system reduces the cost of storing documents, improves
customer service, and improves productivity of employees.
 USAA now saves $70,500,000 for the 10,000,000 documents handled
annually.
14
Data Processing
Data processing in organizations can be viewed either as
transactional or analytical.
 Transactional:
 The data in transactions
processing systems (TPS)
are organized mainly in a
hierarchical structure and
are centrally processed.
 Databases and processing
systems are known as
operational systems.
 Analytical:
 Analytical processing
involves analysis of
accumulated data, mainly
by end-users.
 Includes DSS, EIS, Web
applications, and other enduser activities.
15
Delivery Systems
A good data delivery system
should be able to support:
 Easy data access by the
end-users themselves.
 A quick decision-making
process.
 Accurate and effective
decision making.
 Flexible decision making.
16
Data Warehouses
 The purpose of a data warehouse is to establish a data
repository that makes operational data accessible in a form
readily acceptable for analytical processing activities (e.g.
decision support, EIS)
 Data warehouses include a companion called metadata,
meaning data about data.
Major Benefits of Data Warehouses:
(1) The ability to reach data quickly, as they are located in one
place.
(2) The ability to reach data easily, frequently by end-users
themselves, using Web browsers.
17
Data Warehouses
18
Characteristics of Data Warehouses
1)
2)
3)
4)
5)
6)
Organization. Data are organized by detailed subjects.
Consistency. Data in different operational databases may be
encoded differently. In the warehouse they will be coded in a
consistent manner.
Time variant. The data are kept for 5 to 10 years so they can be
used for trends, forecasting, and comparisons over time.
Non-volatile. Once entered into the warehouse, data are not
updated.
Relational. The data warehouse uses a relational structure.
Client/server. The data warehouse uses the client/server to
provide the end user an easy access to its data.
19
Data Warehouse Suitability
Data warehousing is most appropriate for organizations in which some of the
following apply.
 Large amounts of data need to be accessed by end-users.
 The operational data are stored in different systems.
 An information-based approach to management is in
use.
 There is a large, diverse customer base.
 The same data are represented differently in different
systems.
 Data are stored in highly technical formats that are
difficult to decipher.
 Extensive end-user computing is performed.
20
Data Marts
Data Marts are an alternative used by many other firms is creation of a lower
cost, scaled-down version of a data warehouse. They refer to small
warehouses designed for a strategic business unit (SBU) or a department.
Two major types of Data Marts:
1) Replicated (dependent) Data Marts. In such cases one can replicate
functional subsets of the data warehouse in smaller databases.
2) Stand-Alone Data Marts. A company can have one or more independent
data marts without having a data warehouse.
21
Knowledge Discovery in Databases (KDD)
 KDD is the process of extracting useful knowledge from
volumes of data.
 It is the subject of extensive research.
 KDD’s objective is to identify valid, novel, potentially useful,
and ultimately understandable patterns in data.
 KDD is useful because it is supported by three technologies
that are now sufficiently mature:
 Massive data collection
 Powerful multiprocessor computers
 Data mining algorithms
22
Evolution of KDD
Stages in the Evolution of Knowledge Discovery
Evolutionary Stage Business Question
Data Collection
(1960s)
Enabling Technologies
What was my total
Computer, tapes, disks.
revenue in the last five
years?
Data Access (1980s) What were unit sales in Relational databases
New England last March? (RDBMS), structured query
language (SQL)
Data Warehousing & Drill down to Boston?
Online analytic processing
Decision Support
(OLAP), multidimensional
(early 1990s)
databases, data warehouses
Intelligent Data
What’s likely to happen to Advanced algorithms,
Mining (late 1990s) Boston unit sales next
multiprocessor computers,
month? Why?
massive databases
Characteristics
Retrospective, static
data delivery
Retrospective,
dynamic data delivery
at record level
Retrospective,
dynamic data delivery
at multiple levels
Prospective,
proactive information
delivery
Source: Courtesy of Accrue Software.
23
Tools & Techniques of KDD
 Ad-hoc queries allow users to request in real time
information from the computer that is not available in the
periodical reports. Such answers are needed to expedite
decision making.
 Online analytical processing (OLAP) refers to such enduser activities as DSS modeling using spreadsheets and
graphics, which are done online.
 Ready-made Web-based Analysis. Many vendors provide
ready made analytical tools, mostly in finance, marketing,
and operations.
24
Data Mining
 Data mining derives its name from the similarities
between searching for valuable business
information in a large database,and mining a
mountain for valuable ore.
 Data mining technology can generate new business
opportunities by providing these capabilities:
 Automated prediction of trends and behaviors. Data mining
automates the process of finding predictive information in large
databases.
 Automated discovery of previously unknown patterns. Data mining
tools identify previously hidden patterns in one step.
25
Applications of Data Mining
Data Mining is currently being used in the following areas;
 Retailing & Sales
 Banking
 Manufacturing & Production
 Brokerage & Securities
trading
 Computer hardware &
software







Insurance
Policework
Government & Defense
Airlines
Health care
Broadcasting
Marketing
26
Text & Web Mining
 Text mining is the application of data mining to nonstructured or less structured text files.
 Text mining helps organizations to do the following:
 Find the “hidden” content of documents, including additional useful
relationships.
 Group documents by common themes.
 Web Mining refers to mining tools used to analyze a large amount
of data on the Web, such as what customers are doing on the
Web—that is, to analyze clickstream data.
27
Data Visualization
Data visualization refers to the
presentation of data by
technologies such as digital
images, geographical
information systems,
graphical user interfaces,
multidimensional tables and
graphs, virtual reality, threedimensional presentations,
and animation.
28
CASE: Data Visualization Helps Haworth
Problem
 Haworth Corporation, a major office furniture manufacturer, has
maintained a competitive edge by offering customization.
 But many customers are unable to visualize the 21 million potential
product combinations.
Solution:
 Computer visualization software enables sales representatives with
laptops to show customers exactly what they were ordering.
Results:
 Reduction in time spent between sales reps and CAD operators, &
increased customer satisfaction with quicker delivery.
29
Multidimensionality
 Modern data and information may have several dimensions.
 e.g. Management may be interested in examining sales figures in a
certain city by product, by time period, by salesperson, and by store.
 It is important to provide the user with a technology that allows
him or her to add, replace, or change dimensions quickly and
easily in a table and/or graphical presentation.
 The technology of slicing, dicing, and similar manipulations is
called Multidimensionality.
30
Multidimensionality
Three factors are considered in multidimensionality:
Examples of
dimensions:
Examples of
measures:
Examples of
time:
Products,
salespeople, market
segments, business
units, geographical
locations, distribution
channels, countries,
industries.
Money, sales
volume, head
count, inventory
profit, actual versus
forecasted results.
Daily, weekly,
monthly, quarterly,
yearly.
31
Advantages of Multidimensionality
 Data can be presented and navigated with relative ease.
.
 Multidimensional
databases are easier to maintain.
 Multidimensional databases are significantly faster than
relational databases as a result of the additional dimensions
and the anticipation of how the data will be accessed by
users.
32
Geographic Information Systems (GIS)
 A geographical information system (GIS) is a computer-based
system for capturing, storing, checking, integrating, manipulating,
and displaying data using digitized maps.
– Every record or digital object has an identified geographical location.
 Banks are using GIS for plotting the following:
–
–
–
–
–
–
–
Branch and ATM locations
Customer demographics
Volume and traffic patterns of business activities
Geographical area served by each branch
Market potential for banking activities
Strengths and weaknesses against the competition
Branch performance
33
Geographic Information Systems (GIS)
 GIS Software varies in its capabilities, from simple computerized
mapping systems to enterprise wide tools for decision support data
analysis.
 GIS Data are available from a wide variety of sources. Government
sources (via the Internet and CD-ROM) provide some data, while
vendors provide diversified commercial data as well
 GIS & Decision Making. The graphical format of makes it easy
for managers to visualize the data & make decisions.
 GIS and the Internet or intranet. Most major GIS software
vendors are providing Web access, such as embedded browsers, or
a Web/Internet/intranet server that hooks directly into their software.
 Emerging GIS Applications.
34
Visual Interactive Modeling (VIM)
 Visual interactive modeling
(VIM) uses computer graphic
displays to represent the
impact of different
management decisions on
goals such as profit or
market share.
– A VIM can be used both for
supporting decisions &
training.
– It can represent a static or a
dynamic system.
 Visual interactive simulation
(VIS) is one of the most
developed areas in VIM.
– It is a decision simulation in
which the end-user watches
the progress of the simulation
model in an animated form
using graphics terminals.
35
Virtual Reality
 Virtual reality (VR) is interactive, computer-generated, threedimensional graphics delivered to the user through a headmounted display.
 VR applications to date have been used to support decision
making indirectly.
– Boeing has developed a virtual aircraft mock-up to test designs.
– At Volvo, VR is used to test virtual cars in virtual accidents.
 Data visualization helps financial decision makers by using
visual, spatial & aural immersion virtual systems.
– Some stock brokerages have a VR application in which users surf over
a landscape of stock futures, with color, hue, and intensity.
36
Marketing Transaction Database
 The Marketing transaction database (MTD) combines
many of characteristics of static databases and
marketing data sources into a new database that allows
marketers to engage in real-time personalization and
target every interaction with customers.
 The MTD provides dynamic, or interactive, functions not
available with traditional types of marketing databases.
 Exchanging information allows marketers to refine their
understanding of each customer continuously.
 Data mining, data warehousing, and MTDs are
delivered on the Internet and intranets.
37
Implementation Examples
The following examples illustrate how companies use data mining and
warehousing to support the new marketing approaches;
 Alamo Rent-a-Car discovered that German tourists liked bigger cars.
So now, when Alamo advertises its rental business in Germany, the ads
include information about its larger models.
 Au Bon Pain Company discovered that they were not selling as much
cream cheese as planned. When they analyzed point-of-sale data, they
found that customers preferred small, one-serving packaging.
 AT&T and MCI sift through terabytes of customer phone data to finetune marketing campaigns and determine new discount calling plans.
38
CASE: Data Mining Powers Walmart
 Wal-Mart’s formula for success owes much to the company’s multimilliondollar investment in data warehousing.
 The systems house data on point of sale, inventory, products in transit,
market statistics, customer demographics, finance, product returns, and
supplier performance.
– The data are used for three broad areas of decision support:
• analyzing trends
• managing inventory
• understanding customers
 The data warehouse is available over an extranet to store managers and
suppliers.
– In 2001, 5,000 users made over 35,000 database queries each day.
39
Web-based Data Management Systems
 Business intelligence activities – from data acquisition, through
warehousing, to mining – can be performed with Web tools or are
interrelated with Web technologies and e-Commerce.
 e-Commerce software vendors are providing Web tools that connect the
data warehouse with EC ordering and cataloging systems.
– e.g. Tradelink, a product of Hitachi
 Data warehousing and decision support vendors are connecting their
products with Web technologies and EC.
– e.g. Comshare’s DecisionWeb, Brio’s Brio One, Web Intelligence from
Business Objects, and Cognos’s DataMerchant.
40
Corporate Portals
41
Web-based Data Acquisition & Agents
Web-based Data Acquisition
Intelligent Data Warehouse
 Traditional data acquisition has
become a pervasive element in
today’s business environment.
 The amount of data in the data
warehouse can be very large.
 This acquisition includes both
the recording of information
from online surveys and
questionnaires, and direct
measurements taken in the
manufacturing environment.
 While the organization of data is
done in a way that permits easy
search, it still may be useful to
have a search engine for
specific applications.
42
Managerial Issues
 Cost–benefit issues &
justification. A cost–benefit
analysis must be undertaken
before any commitment to new
technologies.
 Where to store data
physically. Should data be
 The legacy data problem.
What should be done with masses
of information already stored in a
variety of formats, often known as
the legacy data acquisition
problem?
distributed close to their sources?
Or should data be centralized for
easier control.
 Legal issues. Data mining gives
raise to a variety of legal issues.
43
Managerial Issues (cont.)
 Disaster recovery. How well can
an organization’s business
processes recover after an
information system disaster?
 Internal or external? Should a
firm store & maintain its databases
internally or externally?
 Data security and ethics. Are
the company’s competitive data
safe from external snooping or
sabotage?
 Ethics. Should people have to
pay for use of online data?
 Privacy. Collecting data in a
warehouse and conducting data
mining may result in the invasion of
privacy.
 Data purging. When is it
beneficial to “clean house” and
purge information systems of
obsolete or non–cost-effective
data?
 Data delivery. A problem
regarding how to move data
efficiently around an enterprise also
exists.
44