Information Technology and Datawarehousing An Overview

Download Report

Transcript Information Technology and Datawarehousing An Overview

Basic Concepts of
Datawarehousing
An Overview
Prasanth Gurram
How to answer these Business
Queries?
What is the sales distribution region wise?
How did my revenue improve in the past 5 years?
What are the slow movers
in my product line?
Which of my Sales Agents
are doing better?
Which channel costs me
more and pays less?
Strategic Planning / Budgeting
What is Defaulter’s Profile?
Who are my profitable customers?
Currency Risk, Interest
Rate Risk, Liquidity Risk
DSS
Decision Support Systems (DSS) are interactive computerbased systems intended to help decision makers utilize data and
models to identify and solve problems and make decisions.
Data Warehouse is the foundation of DSS process. It is a Strategy and
a Process for Staging Corporate Data.
Enable users to get a “Business View” of
the data
Facilitate Data based Decision Making
that would drive and improve the
Business
Discover “Hidden Trends”
Driving Forces for DSS
Business
Speed
Reform
Customers
RESULT:
Technology
Scenario without DSS
Unavailability of Tools and Techniques for
acquisition of data from various sources for
answering business questions and making
decisions, in earlier days
Intensive efforts in data formatting than data
analysis
Static and inflexible report generation
Time-lag in accessing the information at central
place
OLTP v/s DSS Environment
OLTP Environment
DSS Environment
get data IN
large volumes of simple
transaction queries
continuous data
changes
low processing time
mode of processing
transaction details
data inconsistency
mostly current data
get information OUT
small number of diverse
queries
periodic updates only
high processing time
mode of discovery
subject oriented summaries
data consistency
historical data is relevant
OLTP v/s DSS Environment
OLTP Environment
DSS Environment
high concurrent usage
highly normalized data
structure
static applications
automates routines
low concurrent usage
fewer tables, but more
columns per table
dynamic applications
facilitates creativity
Benefits for Business User






Flexible Information Access
High Availability
Ease of Use
Quality & Completeness of Data
Focus on Information Processing
Information Base for Knowledge Discovery
Available line of technology
Advances in dbms technology
Data warehousing
On-line analytical processing
Data mining
Datawarehouse


Data warehouses store large volumes of data which are
frequently used by DSS.It is maintained separately from
the organization’s operational databases
Data warehouse is subject-oriented, integrated, timevariant, and nonvolatile collection of data
 Subject-oriented : Contains information regarding
objects of interest for decision support: Sales by
region, by product, etc.
 Itegrated: Data are typically extracted from multiple,
heterogeneous data sources (e.g., from sales,
inventory, billing DBs etc.).
 Time-variant: Contain historical data, longer horizon
than operational system.
 Nonvolatile : Data is not (or rarely) directly
updated.
Datawarehouse



Is the enabling technology that facilitates
improved business decision-making
It’s a process, not a product
A technique for assembling and managing a
wide variety of data from multiple operational
systems for decision support and analytical
processing
It’s a journey not a destination...
DW Components
Extraction
FS1
FS2
.
.
.
FSn
Transmission
N
E
T
W
O
R
K
Legacy System
Data Mart
Population
Cleansing
S
T
A
G
I
N
G
Transformation
Aggregation
Summarization
DM1
ODS
DW
DM2
DMn
A
R
E
A
OLAP ANALYSIS
Knowledge Discovery
Metadata Layer
Operational Process
Data extraction
Data Cleansing and Transformation
Data Load and refresh
Build derived data and views
Service queries
Administer the warehouse
Extraction Process
( Data Capturing )
Business
Transactions
Feed System
Application
Data
Capturing
Process
Incremental
Data
Control Metadata
•Extract the incremental data from feed system
•Store the extracted data into a temporary area
•Extract data from multiple, heterogeneous, and external sources
Extraction Process
(Data Transmission )
Feed System Side
Incremental
Data
Network Cloud
FTP
Staging area
Incremental
Data
•Transmit the extracted data from Feed system to Staging area
• Periodicity of transmission ( daily / weekly ) depends upon the feed system
Cleansing Process
Process Metadata
Cleansing Rules
Raw data
(Staging Area)
Cleansing
Process
Clean
data
Good
Control Metadata
Bad
•Detect errors in the data and rectify them
when possible
•Mark it Good/Bad
•Generate the cleansing Reports and mail to
the DWA and Feed System representatives
Cleansing
Reports
Transformation Process
Process Metadata
•Mapping Detail
•Transformation Rule
Clean
Operational
Data
Operational
Data
Store
Transformation
Process
Control Metadata
•Transform the cleaned Operational Data into DSS Data
•Load the DSS data into ODS
•ODS contains the current DSS data at the lowest level of granularity
Summarization Process
DW
ODS
Summarization
Process
Weekly Monthly
Control Metadata
• Summarize and aggregate ODS data and Populate to the Warehouse
• Periodicity of Summarization Process depends upon the level of
summarization at Warehouse ( weekly, monthly, daily )
Yearly
Metadata
Data about Data
Used to maintain Datawarehouse
Control data
Static:
Roles, permissions, naming standards, source system names,
Locations, target names, transformation and mapping rules
Dynamic:
Scheduling, scripts, load statistics, space usage,
Backup statistics
Business data
Business rules,Who validates data,Who controls,How they validate
DW Components/Tools
 Extraction/transformation/load tool (family of tools
including data modeling tool, extraction tool, Meta
data repository, and DW administration tools)
 Meta data exchange architecture (API used to
integrate all components of DW with central Meta
data)
 Target databases (relational, multidimensional,
hybrid)
 Data access and analysis tools for end users
 Database servers, operating systems, networks
DW Tools
Tool Category
ETL Tools
Products
ETI Extract, Informatica, IBM Visual Warehouse
Oracle Warehouse Builder
OLAP Server
Oracle Express Server, Hyperion Essbase, IBM DB2
OLAP Server, Microsoft SQL Server OLAP Services,
Seagate HOLOS, SAS/MDDB
Oracle Express Suite, Business Objects, Web Intelligence,
SAS, Cognos Powerplay/Impromtu, KALIDO,
MicroStrategy, Brio Query, MetaCube,Informatica
Oracle, Informix, Teradata, DB2/UDB, Sybase, Microsoft
SQL Server, RedBricks
SAS Enterprise Miner, IBM Intelligent Miner,
SPSS/Clementine
OLAP Tools
Data Warehouse
Data Mining &
Analysis