Transcript Chapter 1
Data Warehouse Fundamentals
Chapter 3
Trends In Data Warehousing
Paul K Chen
1
Data Warehousing is Becoming
Mainstream
In the early stages, four significant factors drove many
companies to move into data warehousing:
Fierce competition
Government deregulation
Need to revamp internal processes
Imperative for customized marketing
Walmart vs. Amazon.com
Walmart is the US company most quoted for the
successful application & deployment of Data
Warehousing technology.
Walmart filed lawsuit against Amazon.com for its
unlawful way of pirating its DW technology by hiring
away its DA personnel by offering hefty stock option
to these people.
Significant Factors
These significant factors reflect the new trends in data
warehousing:
Multiple Data Types
Data Visualization
Parallel Processing
Query Tools
Browser Tools
Data Fusion
Multidimensional Analysis
Agent Technology
E-Business- ERP, KM, CRM
Decision Making
and Data Warehousing
“A data warehouse is the data, processes, tools, and
facilities to manage and deliver complete, timely, accurate,
and understandable business information to authorized
individuals for effective decision making.”
Structured Data
– Includes traditional relational databases
– Typically internal and enterprise-owned
– Predetermined
Unstructured Data
– Includes articles, reports, images, and videos
– Utilizes external data and expert opinion
– Ad hoc
3
Decision Making
and Data Warehousing
Management Systems
– Extend relational databases to store and support multimedia
– User-defined types (UDT) and functions (UDF) in SQL-3
Specialized Servers
– Used for data which is incompatible with relational databases
(e.g., Streaming video servers)
– Objects may be linked to a relational database
Search Engines
– Query by Image Content (shape, color, texture, etc)
– Text retrieval on free-text documents
– Audio and video searching
Decision Making and Data Warehousing
~ The trend is toward unstructured data and ad hoc
warehouses. ~
~ Trend toward multimedia. ~
4
Types of Decision Support Tools
Data Inquiry
– A request for a set of data based on some search criteria
Data Interpretation
– Manipulation and visualization of a set of data (statistical
analysis)
Multidimensional Analysis (OLAP)
– n-dimensional spreadsheet analysis
Information Discovery
– Pattern recognition, trends
Browsers
– Search metadata catalogs
– Search information object lists
– Launch analysis tools
5
File-based Processing
6
Types of Decision Support Tools
~ Trend toward utilization of the Web, facilitated by Java. ~
7
Data Warehouse Architectures
Single Level
– Decision support tools access operational data directly
– Feasible only with “clean” data
– Valid for unstructured data
Two Level Reconciled
– Scrubbed operational data supporting ad hoc queries
Two Level Derived
– Summarized data
Three Level
– Maintains both scrubbed operational data, and summarized
data.
10
Data Warehouse Architectures
~ Trend toward multidimensional data. ~
11
Data Stores and Access Enablers
Specialized Multidimensional Databases
– Data is peregrinated and loaded into multidimensional
databases
– Long loading times but quick response
Relational-like Stores
– Indexing is used to proved pseudo-multidimensional
functionality
Relational Data Stores
– An extra semantic layer generates multidimensional data on
the fly
Hybrids
– Details are stored in a traditional relational format
– A subset is cached in a multidimensional data structure
12
Database Management System (DBMS)
13
Data Stores and Access Enablers
~ Trend toward multidimensional data.
14
Metadata
Integrated Components
– All components (sources, stores, etc) use a common metadata
repository to maintain their metadata
Standardized Metadata Interchange
– Components keep their own metadata
– Components use a common interchange information model
and syntax to share metadata
Synchronized Metadata Interchange
– Metadata changes are updated automatically across all
components
Building of Business Metadata
– Manually entered, free-text, plain language descriptions
Metadata
~ Trend toward better metadata, exchanged between systems. ~
Middleware - Gluing the Warehouse
Together
Definition: software that shields users and developers
from differences in services and resources used by
applications
Data warehouses often have heterogeneous databases,
operating systems, networks, hardware, applications
Business Issues for Middleware
Role of middleware
– Assist developer in data extraction/transformation
and populating DW
– Assist business user in accessing DW
– Therefore needed at different points in life cycle
Types
– Copy management: data extraction,
transformation, replication, and propagation
– Gateways: DB and independent gateways
– Program-to program: RPCs, TP monitors, ORBs
– Message-oriented
Data Quality
Preprocessing Ownership
– Source application owners know their data
– Warehouse owners still must integrate the entire system
Automated Preprocessing Tools
– Specialized packages
– Generalized tools using pattern processing, lexical analysis,
and statistical matching to reconcile a wide range of data
sources
– Custom programming
Reliability and Credibility of External Data
– Quality ratings
– Posted statistical meta-information (sample size, randomness,
etc)
15
Data Quality
Trend toward better understanding of data quality. ~
16
Significant Trends- Multiple Data Types
Image
Spatial
Structured Numeric
Video
Structured Text
Data Warehouse
Repository
Audio
17
Significant Trends- Data Visualization
More Chart Types-Pie chart, scatter plot
Interactive Visualization
Chart Manipulation
Drill Down
Significant Trends- Parallel Processing
Aims to solve decision-support problems using multiple nodes
working on the same problem.
Performs many database operations simultaneously, splitting
individual tasks into smaller parts so that tasks can be spread
across multiple processors.
Parallel DBMSs must be capable of running parallel queries,
parallel data loading, table scanning, and data archiving, and back
up.
Significant Trends- Parallel Processing
Shared memory architecture (SMP)
– All the servers share all the data
Shared nothing architecture (MPP)
– Each server has its own partition of data
Significant Trends- Query Tools, Browse
Tools
Flexible Presentation –online results and report
generator
Aggregate Awareness
Crossing Subject Areas
Multiple Heterogeneous Sources
Integration
Overcoming SQL Limitations
Data Fusion
Significant Trends- Integrating ERP and
Data Warehouse
Option 1: Companies implement the data warehouse
solutions of the ERP vendor with the currently
available functionality and await the enhancements.
Option 2: Companies implement customized data
warehouse and use third-party tools to extract data
from the ERP datasets. Retrieving and loading data
from the proprietary ERP datasets is not easy.
Option 3: It is a hybrid approach that combines the
functionalities provided by the vendor’s data
warehouse with additional functionalities from thirdparty tools.
Significant Trends- Integrating KM and
Data Warehouse
What’s KM?
It is a systematic process for capturing, integrating,
organizing, and communicating knowledge
accumulated by employees.
It is a vehicle to share corporate knowledge so that
employees may be more more effective and be
productive in their work.
A knowledge management system must store all such
knowledge in a knowledge repository.
Significant Trends- Integrating KM and
Data Warehouse
A specific corporate scenario:
Sales have dropped in the South region.
Your marketing VP is able to discern this from your data
warehouse by running some queries and doing some preliminary
analysis. If he or she has access to a document prepared by an
analyst explaining why the sales are low and suggesting remedial
action.
Knowledge must be linked to the sales result to provide context to
the sales numbers from the data warehouse.
Significant Trends- Integrating KM and
Data Warehouse
An airplane sales scenario: The following information is essential
For a successful pitch for airplane sales.
Model configuration
Production schedule (Delivery schedule)
Part replacement
Warranty
Knowledge obtained from the knowledge management
system can provide context to the information received from
the data warehouse to understand the story behind the above
information.
Summary of Trends
Ad Hoc Questions
Multidimensional Analysis (OLAP)
Web-Enabled Data Warehouse
Multimedia
Middleware
Metadata Interchange
Integrating ERP with Data Warehouse
Integrating KM with Data Warehouse
Complete E-Business Suite– A Review
ERP
Marketing
EAI
Sales
Projects
Financial
Services
Order Mgt
One Database
Human
Resources
Customer
Manufacturing
Relationship(CRM)
Procurement
Supply Chain (SCM)
Information System Categories
Information System Categories
Data Warehouse & ERP
– ERP = Enterprise Resource Planning
– A software solution that addresses enterprise needs
taking the process view of an organization to meet the
organization goals tightly integrating all the functions
of an organization.
-- It integrates all the departments and functions across
a company into a single computer system that can
serve all those different departments’ particular
needs.
WHY ERP?
Business
Customer satisfaction
Business development – new areas, products and
services
Ability to face competition
Efficient processes required for company’s growth
IT
Present software does not met business needs.
Legacy systems difficult to maintain
Obsolete hardware/software difficult to maintain
How ERP?
ERP Combines various department systems into a
single, integrated software program that runs off a
single database so that the various departments can
more easily share information and communicate with
each other.
The best part of ERP is the way in which it improves
the order fulfillment process that is taking the
customer order and process it into an invoice and
revenue.
It doesn’t handle the front-end that is handled by
CRM (Customer Relationship Management).
How ERP?(cont’d)
When a customer service representative enters a customer order
to an ERP system, he has all the information necessary to
complete the order such as customer’s credit rating and order
history from the finance module, the company’s inventory levels
from the warehouse module and the shipping dock’s trucking
schedule from the logistics module.
How it’s being done: It integrates the financial information and
customer order information . It does so by integrating the
following:
Database
Application
Interfaces
Tools
BPR
How ERP? (cont’d)
It standardizes and speeds up the manufacturing
process. This saves time, increases productivity and
reduces head count.
It reduces the inventory. Due to the information
available about all the orders it helps to maintain the
right level of stock and smoothes the manufacturing
process.
Data Warehouse & EAI
What is EAI? EAI refers to Enterprise Application Integration.
EAI is the merging of applications and data from various new
and legacy systems within a business. Various means are
employed to accomplish EAI, including middleware, in order to
unify IT resources, maximize new ERP investments, diminish
errors and get everyone on the same page. EAI enables
companies to link their existing software applications with each
other and with portals. EAI provide the ability to get their
applications to exchange critical data. EAI is usually close to the
top of any CIO's list of concerns. There are different approaches
to EAI. Some rely on linking specific applications with tailored
code, but most rely on generic solutions, typically called
middleware. XML, combined with SOAP and UDDI is a kind of
middleware.
E-Business
~ Trend toward better understanding as well as consolidation
of internal processes and data ~
~ Trend toward web-enabled data warehouse. ~
Tugas
Bagaimana
agar data yang ada menghasilkan
informasi yang berguna dalam pengambilan
keputusan. Jelaskan
Bila Anda sebagai Database Administrator apa
yang akan Anda lakukan terhadap data yang ada?