Chapter 5 Business Intelligence: Data Warehousing, Data

Download Report

Transcript Chapter 5 Business Intelligence: Data Warehousing, Data

Turban, Aronson, and Liang
Decision Support Systems and Intelligent Systems,
Seventh Edition
Chapter 5
Business Intelligence: Data
Warehousing, Data Acquisition, Data
Mining, Business Analytics, and
Visualization
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-1
Learning Objectives
•
•
•
•
•
Describe the issues in management of data.
Understand the concepts and use of DBMS.
Learn about data warehousing and data marts.
Explain business intelligence/business analytics.
Examine how decision making can be improved
through data manipulation and analytics.
• Understand the interaction between the Web and
database technologies.
• Explain how database technologies are used in
business analytics.
• Understand the impact of the Web on business
intelligence and analytics.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-2
Business Intelligence: Data Warehousing, Data
Acquisition, Data Mining, Business Analytics, and
Visualization
Many organizations have amassed vast amounts of
data that employees use to unlock valuable secrets to
enable the organization to compete successfully. Some
organizations do this extremely well, but others are
quite ineffective. To use analytic tools to improve
organization decision making, a foundation data
architecture & enterprise architecture must be in place
to facilitate effective decision analysis. Enabling
decision analysis through access to all relevant
information is known as BI which include data
warehouse, online analytical processing, data mining,
visualization and multidimensionality.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-3
Information Sharing a Principle
Component of the National Strategy for
Homeland Security Vignette
• Network of systems that provide
knowledge integration and distribution
• Horizontal and vertical information
sharing
• Improved communications
• Mining of data stored in Web-enabled
warehouse
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-4
Information Sharing a Principle
Component of the National Strategy for
Homeland Security Vignette
Data warehouse provide a strategic data
architecture to enable decision support
analysis. It enables data mining, the ability to
automatically synthesize vast amounts of
information in order to discover the hidden
truth.
It has been emerged in the form of web
enabled data warehouse as a direct response
to the Sep 11, 2001.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-5
Information Sharing a Principle
Component of the National Strategy for
Homeland Security Vignette
The main purpose was to build a national environment
that enables the sharing of essential homeland security
information. We must build a system of systems that can
provide the right information to the right people at all
times. Information will be shared “horizontally” across
each level of government and “vertically” among:
1. Federal government.
2. State government.
3. Local government.
4. Private industry.
5. Citizens.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-6
Information Sharing a Principle
Component of the National Strategy for
Homeland Security Vignette
The five major initiatives that are identified within the
strategy include:
1. To integrate information sharing across the federal
government.
2. To extend the integration of information sharing
across the vertical levels.
3. To adopt common metadata standards of electronic
information relevant to homeland security.
4. To improve public safety communication.
5. To ensure reliable public health information.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-7
The Nature & Source of data
In order to understand a situation, a decisionmaker needs data, information, and knowledge.
This must be integrated and organized in a
manner that makes them useful. Then the
decision maker must be able to apply analysis
tool processing so that data, information and
knowledge can be utilized to full benefit. In the
right hands, these tools provide great decision
makers with great capabilities.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-8
Data, Information, Knowledge
• Data
– Items that are the most elementary descriptions
of things, events, activities, and transactions
– May be internal or external.
• Information
– Organized data that has meaning and value
• Knowledge
– Processed data or information that conveys
understanding or learning applicable to a
problem or activity
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-9
Data
• Raw data collected manually or by instruments
• Quality is critical
– Quality determines usefulness the 4 categories of data
quality are:
1.
2.
3.
4.
Contextual data quality
Intrinsic data quality
Accessibility data quality
Representation data quality
– Often neglected or casually handled
– Problems exposed when data is summarized
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-10
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-11
Data / Cleanse data
1. When populating warehouse
2. Data quality action plan p.221
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-12
Data / Cleanse data
3. Best practices for data quality
4. Measure results
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-13
Data/ integrity issues
– Data itself should has no contradiction.
– Uniformity, within limits.
– Version, when moving through computers and
there is different versions.
– Completeness check
 Check if summaries are correct
 Check if all needed values to generate
summary are included.
– Conformity check, the summaries are similar to
what expected and previous summaries.
– Genealogy or drill-down
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-14
Data Integration
Data Integration is the process of
collecting data from more than one
source.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-15
Data Integration
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-16
Data Integration
• Access needed to multiple sources
– Often enterprise-wide
– Disparate and heterogeneous databases
– XML becoming language standard
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-17
External Data Sources
• Web
– Intelligent agents
– Document management systems
– Content management systems
• Commercial databases
– Sell access to specialized databases
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-18
Database Management Systems
DBMS is :1. Software program
2. Supplements operating system
3. Manages data
4. Queries data and generates reports
5. Data security
6. Combines with modeling language like
spreadsheets for construction of DSS
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-19
Database Models
• Hierarchical
– Top down, like inverted tree
– Fields have only one “parent”, each “parent”
can have multiple “children”
– Fast used in transaction (insert, delete, update)
processing.
• Network
– Relationships created through linked lists, using
pointers
– “Children” can have multiple “parents”
– Greater flexibility, substantial overhead
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-20
Database Models
• Relational
– Flat, two-dimensional tables with multiple
access queries
– Examines relations between multiple tables
– Flexible, quick, and extendable with data
independence
– Best used in retrieving data.
• Object oriented
– Data analyzed at conceptual level
– Inheritance, abstraction, encapsulation
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-21
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-22
Database Models, continued
• Multimedia Based
– Multiple data formats
• JPEG, GIF, bitmap, PNG, sound, video, virtual reality
– Requires specific hardware for full feature
availability
• Document Based
– Document storage and management
• Intelligent
– Intelligent agents and ANN
• Inference engines
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-23
Data Warehouse
• Subject oriented
• Scrubbed so that data from heterogeneous sources are
standardized
• Time series; no current status
• Nonvolatile
– Read only
• Summarized
• Not normalized; may be redundant
• Data from both internal and external sources is present
• Metadata included
– Data about data
• Business metadata
• Semantic metadata
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-24
Data Warehouse
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-25
Architecture
• May have one or more tiers
– Determined by warehouse, data
acquisition (back end), and client (front
end)
• One tier, where all run on same platform, is
rare
• Two tier usually combines DSS engine
(client) with warehouse
– More economical
• Three tier separates these functional parts
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-26
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-27
Data Warehouse
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-28
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-29
Data Warehouse
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-30
Migrating Data
• Business rules
– Stored in metadata repository
– Applied to data warehouse centrally
• Data extracted from all relevant sources
– Loaded through data-transformation tools or
programs
– Separate operation and decision support
environments
• Correct problems in quality before data
stored
– Cleanse and organize in consistent manner
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-31
Data Warehouse Design
Star schema
The data warehouse is based on the concept
of dimensional modeling.
Dimensional modeling is a retrieval based
model that supports high volume query
access.
The star schema is the means of
implementing the dimensional modeling.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-32
Data Warehouse Design
• Dimensional modeling
– Retrieval based
– Implemented by star schema
• Central fact table
• Dimension tables
• Grain
– Highest level of detail
– Drill-down analysis is not supported if the data
warehouse can give high summarized level.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-33
Data Warehouse Design
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-34
Data Warehouse
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-35
Data Warehouse Development
• Data warehouse implementation techniques
–
–
–
–
Top down
Bottom up
Hybrid
Federated
• Projects may be data centric or application centric
• Implementation factors
– Organizational issues
– Project issues
– Technical issues
• Scalable = expandable.
• Flexible
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-36
Data Warehouse
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-37
Data Marts
Data marts is a subset of a data
warehouse, typically consisting of a
single subject area (e.g., marketing,
personnel…).
Data mart can be either dependent or
independent
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-38
Data Marts
• Dependent is :
– Created from warehouse by replication. It is
copied from data warehouse, it is built after the
building of warehouse.
• Functional subset of warehouse
• Independent
– Scaled down, less expensive version of data
warehouse
– Designed for a department or SBU
– Organization may have multiple data marts
• Difficult to integrate
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-39
Business Intelligence and Analytics
• Business intelligence
– Acquisition of data and information for
use in decision-making activities
• Business analytics
– Models and solution methods
• Data mining
– Applying models and methods to data to
identify patterns and trends
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-40
Business Intelligence and Analytics
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-41
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition,
Turban, Aronson, and Liang
5-42