Information Technology and Datawarehousing An Overview
Download
Report
Transcript Information Technology and Datawarehousing An Overview
Basic Concepts of
Datawarehousing
An Overview
Prasanth Gurram
How to answer these Business
Queries?
What is the sales distribution region wise?
How did my revenue improve in the past 5 years?
What are the slow movers
in my product line?
Which of my Sales Agents
are doing better?
Which channel costs me
more and pays less?
Strategic Planning / Budgeting
What is Defaulter’s Profile?
Who are my profitable customers?
Currency Risk, Interest
Rate Risk, Liquidity Risk
DSS
Decision Support Systems (DSS) are interactive computerbased systems intended to help decision makers utilize data and
models to identify and solve problems and make decisions.
Data Warehouse is the foundation of DSS process. It is a Strategy and
a Process for Staging Corporate Data.
Enable users to get a “Business View” of
the data
Facilitate Data based Decision Making
that would drive and improve the
Business
Discover “Hidden Trends”
Driving Forces for DSS
Business
Speed
Reform
Customers
RESULT:
Technology
Scenario without DSS
Unavailability of Tools and Techniques for
acquisition of data from various sources for
answering business questions and making
decisions, in earlier days
Intensive efforts in data formatting than data
analysis
Static and inflexible report generation
Time-lag in accessing the information at central
place
OLTP v/s DSS Environment
OLTP Environment
DSS Environment
get data IN
large volumes of simple
transaction queries
continuous data
changes
low processing time
mode of processing
transaction details
data inconsistency
mostly current data
get information OUT
small number of diverse
queries
periodic updates only
high processing time
mode of discovery
subject oriented summaries
data consistency
historical data is relevant
OLTP v/s DSS Environment
OLTP Environment
DSS Environment
high concurrent usage
highly normalized data
structure
static applications
automates routines
low concurrent usage
fewer tables, but more
columns per table
dynamic applications
facilitates creativity
Benefits for Business User
Flexible Information Access
High Availability
Ease of Use
Quality & Completeness of Data
Focus on Information Processing
Information Base for Knowledge Discovery
Available line of technology
Advances in dbms technology
Data warehousing
On-line analytical processing
Data mining
Datawarehouse
Data warehouses store large volumes of data which are
frequently used by DSS.It is maintained separately from
the organization’s operational databases
Data warehouse is subject-oriented, integrated, timevariant, and nonvolatile collection of data
Subject-oriented : Contains information regarding
objects of interest for decision support: Sales by
region, by product, etc.
Itegrated: Data are typically extracted from multiple,
heterogeneous data sources (e.g., from sales,
inventory, billing DBs etc.).
Time-variant: Contain historical data, longer horizon
than operational system.
Nonvolatile : Data is not (or rarely) directly
updated.
Datawarehouse
Is the enabling technology that facilitates
improved business decision-making
It’s a process, not a product
A technique for assembling and managing a
wide variety of data from multiple operational
systems for decision support and analytical
processing
It’s a journey not a destination...
DW Components
Extraction
FS1
FS2
.
.
.
FSn
Transmission
N
E
T
W
O
R
K
Legacy System
Data Mart
Population
Cleansing
S
T
A
G
I
N
G
Transformation
Aggregation
Summarization
DM1
ODS
DW
DM2
DMn
A
R
E
A
OLAP ANALYSIS
Knowledge Discovery
Metadata Layer
Operational Process
Data extraction
Data Cleansing and Transformation
Data Load and refresh
Build derived data and views
Service queries
Administer the warehouse
Extraction Process
( Data Capturing )
Business
Transactions
Feed System
Application
Data
Capturing
Process
Incremental
Data
Control Metadata
•Extract the incremental data from feed system
•Store the extracted data into a temporary area
•Extract data from multiple, heterogeneous, and external sources
Extraction Process
(Data Transmission )
Feed System Side
Incremental
Data
Network Cloud
FTP
Staging area
Incremental
Data
•Transmit the extracted data from Feed system to Staging area
• Periodicity of transmission ( daily / weekly ) depends upon the feed system
Cleansing Process
Process Metadata
Cleansing Rules
Raw data
(Staging Area)
Cleansing
Process
Clean
data
Good
Control Metadata
Bad
•Detect errors in the data and rectify them
when possible
•Mark it Good/Bad
•Generate the cleansing Reports and mail to
the DWA and Feed System representatives
Cleansing
Reports
Transformation Process
Process Metadata
•Mapping Detail
•Transformation Rule
Clean
Operational
Data
Operational
Data
Store
Transformation
Process
Control Metadata
•Transform the cleaned Operational Data into DSS Data
•Load the DSS data into ODS
•ODS contains the current DSS data at the lowest level of granularity
Summarization Process
DW
ODS
Summarization
Process
Weekly Monthly
Control Metadata
• Summarize and aggregate ODS data and Populate to the Warehouse
• Periodicity of Summarization Process depends upon the level of
summarization at Warehouse ( weekly, monthly, daily )
Yearly
Metadata
Data about Data
Used to maintain Datawarehouse
Control data
Static:
Roles, permissions, naming standards, source system names,
Locations, target names, transformation and mapping rules
Dynamic:
Scheduling, scripts, load statistics, space usage,
Backup statistics
Business data
Business rules,Who validates data,Who controls,How they validate
DW Components/Tools
Extraction/transformation/load tool (family of tools
including data modeling tool, extraction tool, Meta
data repository, and DW administration tools)
Meta data exchange architecture (API used to
integrate all components of DW with central Meta
data)
Target databases (relational, multidimensional,
hybrid)
Data access and analysis tools for end users
Database servers, operating systems, networks
DW Tools
Tool Category
ETL Tools
Products
ETI Extract, Informatica, IBM Visual Warehouse
Oracle Warehouse Builder
OLAP Server
Oracle Express Server, Hyperion Essbase, IBM DB2
OLAP Server, Microsoft SQL Server OLAP Services,
Seagate HOLOS, SAS/MDDB
Oracle Express Suite, Business Objects, Web Intelligence,
SAS, Cognos Powerplay/Impromtu, KALIDO,
MicroStrategy, Brio Query, MetaCube,Informatica
Oracle, Informix, Teradata, DB2/UDB, Sybase, Microsoft
SQL Server, RedBricks
SAS Enterprise Miner, IBM Intelligent Miner,
SPSS/Clementine
OLAP Tools
Data Warehouse
Data Mining &
Analysis