Data Warehousing

Download Report

Transcript Data Warehousing

Data Warehousing
Tyler Helmle
Introduction




Definition
History
Processes and Terms
Management Techniques
Definition

Collection of Data





Subject Oriented
Integrated
Time-Variant
Non-Volatile
To Support Managements Decision Making
Process
Subject Oriented


Particular Subject
Instead of Companies Ongoing Operations
Integrated


Gathered from variety of Sources
Merged into a Whole
Time-Variant

Specific Time Period




Monthly
Weekly
Daily
Etc.
Non-Volatile



Stable
Never Remove Data
Consistent Picture of Business
Definition

Bill Inmon


Single Subject


1990
Data Mart
Possible to be Volatile


Size Contraints
Rolling Lengths
Definition

Ralph Kimball




“a copy of transaction data specifically structured
for query and analysis.”
Less insight
Less depth
Not Less Accurate
History

Concept Originated (Late 1980’s)




IBM Researchers
Barry Delvin and Paul Murphy
“Business Data Warehouse”
Operational Systems to Decision Support
History

Trends

Corporations had multiple Decision Support
Systems



Expensive
Redundant
Get to one efficient warehouse

Data Marts
History

Bill Inmon


“Father of the Data Warehouse”
Building of the Data Warehouse


1991
Taught
Data Warehouse
Design

Designing and Rolling Out Warehouse

Surajit Chaudhuri and Umeshwar Dayal
Design




Define the architecture, do capacity planning,
and select the storage servers, database and
OLAP servers, and tools
Integrate the servers, storage, and client
tools
Design the warehouse schema and views
Define the physical warehouse organization,
data placement, partitioning, and access
methods
Design (Cont.)



Connect the sources using gateways, ODBC
drivers, or other wrappers
Design and implement scripts for data
extraction, cleaning, transformation, load, and
refresh
Populate the repository with the schema and
view definitions, scripts, and other metadata
Design (Cont.)


Design and implement end-user applications
Roll out the warehouse and applications
Design (Cont.)

Database Designs


Multidimensional Approach
Normalized Approach
Data Warehouse
Raw Data





Business will Decide
Department?
Market Data?
Customer Data?
Data Granularity
Data Warehouse
Data Extracting

Extract from Source Systems



Database
Flat Files
Converts into format for transformation
Data Transforming



Data Validation
Difficult
Common Problems Occur
Data Transforming


Selection of Columns
Translation of Data





Make all data consistent
Derive New Calculated Value
Filtering
Sorting
Joining of Data
Data Transforming


Transposing
Splitting Columns
Data Transforming Tools

Data Migration


Data Scrubbing



Transformation of Data
Domain Specific Knowledge
Fixing or Eliminating Data
Data Auditing


Patterns
Inconsistencies
Data Load



Batch Processing
Visibility and Management
Administrator



Start, Cancel, Suspend, Resume Load
Large
Quick and Efficient
Data Load

Parallel Processing


Splitting Data Files to provide Parallel Access
Pipeline


Component


Simultaneous running of several components
Simultaneous running of multiple processes
All three types usually operate at same time
Data Refresh


When?
How?
Data Warehouse
Data Mart





Subset of Data
Major Data Subject
Increase Performance
Separate Security
Prove ROI before adding to Data Warehouse
Data Mart


Ease of Creation
Lower Cost than Data Warehouse
Data Mart

Cons




Limited Scalability
Duplication of Data
Data Inconsistency
Low ability to leverage enterprise sources of data
Data Warehouse
Metadata





“Data about Data”
Clarifies Data
Explains Data
Helps Navigate Through Data
Facilitates (Both Human and Computers)



Understanding
Usage
Management
Metadata

Different from Data



Sometimes can be both
Point of view
Types





Descriptive
Administrative
Structural
Technical
Use
Metadata

Content



Mutability



Describe Resource
Describe Content
Immutable
Mutable
Logical Function



Sub-symbolic
Symbolic
Logical
Metadata

Back End


Extract, Transform, Load
Front End


Label Screens
Create Reports
Data Warehouse
End Product




Analysis
Querying
Reporting
Data Mining
Data Mining




Requires Large Amount of Data
Determine Patterns in Data
Determine Relationships in Data
Analyze Results and Conclude
Standard Reporting




Weekly/Monthly/Yearly
Same Constant Data
Recent Information
Arrives at Expected Time on Expected
Medium
Configurable Reports


Select Parameters to Show on Report
Ex.



Time Period
Region
Support of Data Remains the Same
Ad Hoc Reporting


End Users Pick and Choose Data
Requires



Training
Knowledge of Data
Do Not Want Users Miss Using the Data
Dashboards / Scorecards




Like Standard Reports
Summarized
Easy to Read
Graphical
Data Warehouse
Data Warehouse Management

Data Warehouse Administrator


Visibility
Knowledge of what is happening

Loading, Extracting, Etc.
Data Warehouse Management

Responsibilities




What is data being used for?
Who is using the data?
What is the response time?
What kinds of activities are being submitted?
Data Warehouse Management

Data Management (Data Monitor)

Counts of Data
Analyzes Profiles of Data
Determines Threshold of Data

Makes DWA’s job easier


Data Warehouse Management

Security




Confidential?
Access Restriction
Encryption
Decryption
Pros

One Common Data Source




Ease of Reporting and Analysis
Inconsistencies Identified Right Away
Retrieve Data Without Slowing Down
Operational Systems
Show Actual Performance
Cons


Data Is Not Always Most Current
High Costs




Installation
Maintenance
Support
Hard To Stay Ahead
Summary



Data Warehousing can prove to be a success
Common Knowledge of a database
technology
As technology and research continues, the
possibilities are endless
Questions?