幻灯片 1 - 这是一个测试

Transcript 幻灯片 1 - 这是一个测试

Data Warehousing Fundamentals
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Course Objectives
 After completing this course, you should be able
to do the following:
 Describe the role of business intelligence (BI) and
data warehousing in today’s marketplace
 Describe data warehousing terminology and the
various technologies that are required to
implement a data warehouse
 Explain the implementation and organizational
issues surrounding a data warehouse project
 Identify data warehouse modeling concepts
 Explain the extraction, transformation, and loading
processes for building a data warehouse
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Course Objectives
 Identify management and maintenance
processes that are associated with a data
warehouse project
 Describe methods for refreshing warehouse data
 Explain warehouse metadata concepts
 Identify tools that can be employed at each stage
of the data warehouse project
 Describe user profiles and techniques for
querying the warehouse
 Identify methods and tools for accessing and
analyzing warehouse data
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Lessons
1.
2.
Business Intelligence and Data Warehousing
Defining Data Warehouse Concepts and
Terminology
3. Planning and Managing the Data Warehouse Project
4. Modeling the Data Warehouse
5. Building the Data Warehouse: Extracting Data
6. Building the Data Warehouse: Transforming Data
7. Building the Data Warehouse: Loading Warehouse
Data
8. Refreshing Warehouse Data
9. Leaving a Metadata Trail
10. Managing and Maintaining the Data Warehouse
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Let’s Get Started
Lesson 1
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Lesson 1 Objectives
 After completing this lesson, you should be
able to do the following:
 Describe the role of business intelligence in
today’s marketplace
 Describe why an online transaction processing
system (OLTP) is not suitable for analytical
reporting
 Describe how extract processing for decision
support querying led to data warehouse
solutions that are employed today
 Explain why businesses are driven to employ
data warehouse technology
ｎｏｙｎｏｔ@１６３．ｃｏｍ
What Is Business Intelligence?
“Business Intelligence is the process of transforming
data into information and through discovery transforming
that information into knowledge.”
Gartner Group
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Purpose of Business Intelligence
The purpose of business intelligence is to convert
the volume of data into business value through
analytical reporting.
Decision
Knowledge
Information
Data
Value
Volume
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Early Management
Information Systems
 MIS systems provided business data.
 Reports were developed on request.
 Reports provided little analysis capability.
 Decision support tools gave personal ad hoc
access to data.
Ad hoc access
Production
platforms
Operational reports
Decision makers
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Analyzing Data from
Operational Systems
 Data structures are complex.
 Systems are designed for high performance and
throughput.
 Data is not meaningfully represented.
 Data is dispersed.
 OLTP systems may be unsuitable for intensive
queries.
Production
platforms
Operational reports
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Why OLTP Is Not Suitable
for Analytical Reporting
OLTP
Analytical Reporting
Information to support
day-to-day service
Historical information
to analyze
Data stored at transaction
level
Data needs to be integrated
Database design:
Normalized
Database design:
Denormalized, star schema
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Data Extract Processing
 End user computing offloaded from the operational
environment
 User’s own data
Operational
systems
Extracts
Decision
makers
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Management Issues with
Data Extract Programs
Operational
systems
Extracts
Decision
makers
Extract Explosion
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Productivity Issues with
Extract Processing
 Duplicated effort
 Multiple technologies
 Obsolete reports
 No metadata
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Data Quality Issues with
Extract Processing
 No common time basis
 Different calculation algorithms
 Different levels of extraction
 Different levels of granularity
 Different data field names
 Different data field meanings
 Missing information
 No data correction rules
 No drill-down capability
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Data Warehousing and
Business Intelligence
Legacy
Data
Enterprise Data
Warehouse
Operations
Data
External
Data
Data Marts
Analytical
Reporting
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Advantages of Warehouse
Processing Environments
 Controlled
 Reliable
 Quality information
 Single source of data
Internal and
external systems
Data
warehouse
Decision
makers
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Advantages of Warehouse
Processing Environments
 No duplication of effort
 No need for tools to support many technologies
 No disparity in data, meaning, or representation
 No time period conflict
 No algorithm confusion
 No drill-down restrictions
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Success Factors for a Dynamic
Business Environment
 Know the business
 Reinvent to face new challenges
 Invest in products
 Invest in customers
 Retain customers
 Invest in technology
 Improve access to business information
 Provide superior services and products
 Be profitable
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Business Drivers for
Data Warehouses
 Provide supporting information systems
 Get quality information:
• Reduce costs
• Streamline the business
• Improve margins
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Technological Advances
Enabling Data Warehousing
 Hardware
 Operating system
 Database
 Query tools
•
 Applications
•
•
•
•
•
Large databases
64-bit architectures
Indexing techniques
Affordable, cost-effective
open systems
Robust warehouse tools
Sophisticated end user tools
ｎｏｙｎｏｔ@１６３．ｃｏｍ
两种数据的区别
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Summary
 In this lesson, you should have learned how to:
 Describe the role of business intelligence in
today’s marketplace
 Describe why an online transaction
processing system (OLTP) is not suitable for
analytical reporting
 Describe how extract processing for decision
support querying led to data warehouse
solutions employed today
 Explain why businesses are driven to employ
data warehouse technology
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Practice 1-1 Overview
 This practice covers the following topics:
 Answering questions about data warehousing
 Discussing how data warehousing meets business
needs
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Lesson 2
Defining Data Warehouse
Concepts and Terminology
Objectives
 After completing this lesson, you should be able to
do the following:
 Identify a common, broadly accepted definition of a
data warehouse
 Describe the differences of dependent and
independent data marts
 Identify some of the main warehouse development
approaches
 Recognize some of the operational properties and
common terminology of a data warehouse
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Definition of a Data Warehouse
 “A data warehouse is a subject oriented,
integrated, non-volatile, and time variant
collection of data in support of management’s
decisions.”
 — W.H. Inmon
“数据仓库是一个面向主题的、集成的、随时间变化
的、非易失的、用于战略决策的数据集合”
“Building the Data
Warehouse”(1991)
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Definition of a Data Warehouse
 “…数据仓库无非是所有数据集市的集合...”
— Ralph Kimball
 “数据仓库是信息数据库的具体实现，用来存储源
自业务数据库的共享数据。典型的数据仓库应该是
一个主题数据库，支持用户从巨大的运营数据存储
中发现信息，支持对业务趋势进行跟踪和响应，实
现业务的预测和计划。”
— DM Review
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Definition of a Data Warehouse
“An enterprise structured repository of
subject-oriented, time-variant, historical data
used for information retrieval and decision
support. The data warehouse stores atomic
and summary data.”
— Oracle’s Data Warehouse Definition
“数据仓库是一个过程而不是一个项目”
—另一角度描述数据仓库
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Data Warehouse Properties
Integrated
Subjectoriented
Data
Warehouse
Nonvolatile
Time-variant
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Subject-Oriented
 Data is categorized and stored by business subject
rather than by application.
OLTP Applications
Data Warehouse
Subject
Equity Plans
Shares
Insurance
Loans
Savings
Customer financial
information
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Integrated
 Data on a given subject is defined and stored once.
Savings
Current
Accounts
Loans
OLTP Applications
Customer
Data Warehouse
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Time-Variant
 Data is stored as a series of snapshots, each
representing a period of time.
Data
Warehouse
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Nonvolatile
 Typically data in the data warehouse is not updated
or deleted.
Operational
Warehouse
Load
Insert, Update,
Delete, or Read
Read
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Changing Warehouse Data
Operational Databases
Warehouse Database
First time load
Refresh
Refresh
Refresh
Purge or
Archive
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Data Warehouse Versus OLTP
Property
OLTP
Data Warehouse
Response Time
Sub seconds to
seconds
Seconds to hours
Operations
DML
Primarily Read only
Nature of Data
30 – 60 days
Snapshots over time
Data Organization
Application
Subject, time
Size
Small to large
Large to very large
Data Sources
Operational, Internal
Operational,
Internal, External
Activities
Processes
Analysis
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Usage Curves
 Operational system is predictable
 Data warehouse:
• Variable
• Random
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Enterprise wide Warehouse
 Large scale implementation
 Scopes the entire business
 Data from all subject areas
 Developed incrementally
 Single source of enterprisewide data
 Synchronized enterprisewide data
 Single distribution point to dependent data
marts
ｎｏｙｎｏｔ@１６３．ｃｏｍ
数据仓库设计中心思想
 具有一个合适的粒度或细节以满足所有的数据集市
 设计不能阻碍在数据集市中使用各种技术，能适应多维集市、
统计、挖掘及探索型仓库
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Data Marts
 数据仓库数据的一个子集。
BI环境中的大部分分析活动均在数据集市中进行。每个数
据集市中的数据通常是为特定的功能所定制，不必对其他
的使用有效。
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Data Warehouses Versus
Data Marts
Property
Data Warehouse
Data Mart
Scope
Enterprise
Department
Subjects
Multiple
Single-subject, LOB
Data Source
Many
Few
Implementation time
Months to years
Months
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Dependent Data Mart
Data Marts
Operational
Systems
Legacy
Data
Flat Files
Operations
Data
External
Data
Data
Warehouse
Marketing
Sales
Finance
HR
External
Data
Marketing
Sales
Finance
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Independent Data Mart
Operational
Systems
Legacy
Data
Flat Files
Sales or
Marketing
Operations
Data
External
Data
External
Data
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Features of a Data Mart
 Not Real-Time Data
 Consolidation and Cleansing
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Warehouse Development Approaches
 “Big bang” approach
 Incremental approach:
• Top-down incremental approach
• Bottom-up incremental approach
ｎｏｙｎｏｔ@１６３．ｃｏｍ
“Big Bang” Approach
Analyze enterprise
requirements
Build enterprise
data warehouse
Report in subsets or
store in data marts
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Top-Down Approach
Analyze requirements at the enterprise level
Develop conceptual information model
Identify and prioritize subject areas
Complete a model of selected subject area
Map to available data
Perform a source system analysis
Implement base technical architecture
Establish metadata, extraction, and load
processes for the initial subject area
Create and populate the initial subject area
data mart within the overall warehouse
framework
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Bottom-Up Approach
Define the scope and coverage of the
data warehouse and analyze the source
systems within this scope
Define the initial increment based on the
political pressure, assumed business
benefit and data volume
Implement base technical architecture
and establish metadata, extraction, and
load processes as required by increment
Create and populate the initial subject
areas within the overall warehouse
framework
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Incremental Approach
to Warehouse Development
 Multiple iterations
 Shorter implementations
 Validation of each phase
Increment 1
Strategy
Definition
Analysis
Design
Iterative
Build
Production
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Data Warehousing
Process Components
 Methodology
 Architecture
 Extraction, Transformation, and Load (ETL)
 Implementation
 Operation and Support
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Methodology
 Ensures a successful data warehouse
 Encourages incremental development
 Provides a staged approach to an
enterprisewide warehouse:
•
•
•
•
Safe
Manageable
Proven
Recommended
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Architecture
 “Provides the planning, structure, and
standardization needed to ensure integration of
multiple components, projects, and processes
across time.”
 “Establishes the framework, standards, and
procedures for the data warehouse at an
enterprise level.”
– — The Data Warehousing Institute
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Extraction, Transformation,
and Load (ETL)
“Effective data extract, transform and load (ETL)
processes represent the number one success
factor for your data warehouse project and can
absorb up to 70 percent of the time spent on a
typical data warehousing project.”
— DM Review, March 2001
Source
Staging Area
Target
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Implementation
Data Warehouse Architecture
Ex., Incremental Implementation
Implementation
Increment 1
Increment 2
.
.
.
Increment n
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Operation and Support
 Data access and reporting
 Refreshing warehouse data
 Monitoring
 Responding to change
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Phases of the
Incremental Approach
 Strategy
 Definition
 Analysis
 Design
 Build
 Production
Increment 1
Strategy
Definition
Analysis
Design
Build
Production
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Strategy Phase Deliverables
 Business goals and objectives
 Data warehouse purpose, objectives, and scope
 Enterprise data warehouse logical model
 Incremental milestones
 Source systems data flows
 Subject area gap analysis
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Strategy Phase Deliverables
 Data acquisition strategy
 Data quality strategy
 Metadata strategy
 Data access environment
 Training strategy
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Summary
 In this lesson, you should have learned how
to:
 Identify a common, broadly accepted definition
of a data warehouse
 Describe the differences of dependent and
independent data marts
 Identify some of the main warehouse
development approaches
 Recognize some of the operational properties
and common terminology of a data warehouse
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Practice 2-1 Overview
 This practice covers the following topics:
 Answering questions regarding data warehousing
concept and terminology
 Discussing some of the data warehouse concept and
terminology
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Lesson 3
Modeling the Data Warehouse
Objectives
After completing this lesson, you should be
able to do the following:
 Discuss data warehouse environment data
structures
 Discuss data warehouse database design
phases:
• Defining the business model
• Defining the dimensional model
• Defining the physical model
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Data Warehouse Modeling Issues
 Among the main issues that data warehouse data
modelers face are:
 Different data types
 Many ways to use warehouse data
 Many ways to structure the data
 Multiple modeling techniques
 Planned replication
 Large volumes of data
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Data Warehouse Environment
Data Structures
 The data modeling structures that are
commonly found in a data warehouse
environment are:
 Third normal form (3NF)
 Star schema
 Snowflake schema
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Star Schema Model
Product Table
Store Table
Product_id
Product_disc,...
Store_id
District_id,...
Sales Fact Table
Central
fact table
Product_id
Store_id
Item_id
Day_id
Sales_amount
Sales_units, ...
Denormalized
dimensions
Time Table
Day_id
Month_id
Year_id,...
Item Table
Item_id
Item_desc,...
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Snowflake Schema Model
Product Table
Product_id
Product_desc
Store Table
Store_id
Store_desc
District_id
District Table
District_id
District_desc
Sales Fact Table
Item_id
Store_id
Product_id
Week_id
Sales_amount
Sales_units
Time Table
Item Table
Dept Table
Mgr Table
Week_id
Period_id
Year_id
Item_id
Item_desc
Dept_id
Dept_id
Dept_desc
Mgr_id
Dept_id
Mgr_id
Mgr_name
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Snowflake Schema Model
 Direct use by some tools
 More flexible to change
 Provides for speedier data loading
 Can become large and unmanageable
 Degrades query performance
 More complex metadata
Country
State
County
City
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Data Warehouse Database
Design Phases
 Phase 1:
Defining the business model
 Phase 2:
Defining the dimensional model
 Phase 3:
Defining the physical model
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Phase 1: Defining the Business Model
 Performing strategic analysis
 Creating the business model
 Documenting metadata
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Performing Strategic Analysis
 Identify crucial business processes
 Understand business processes
 Prioritize and select the business processes to
implement
High
Business
Benefit
Low
Low
Feasibility
High
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Creating the Business Model
 Defining business requirements:
•
•
•
•
Identifying the business measures
Identifying the dimensions
Identifying the grain
Identifying the business definitions and rules
 Verifying data sources
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Business Requirements Drive
the Design Process
 Primary input
Business
Requirements
 Secondary input
Existing Metadata
Production ERD Model
Research
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Identifying Measures
and Dimensions
Measures
The attribute varies
continuously:
 Balance
 Units Sold
 Cost
 Sales
The attribute is
perceived as constant or
discrete:
 Product
 Location
 Time
 Size
Dimensions
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Using a Business Process Matrix
Business Processes
Business
Dimensions
Sales
Returns
Inventory
Customer
Date
Product
Channel
Promotion
Sample of business process matrix
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Determining Granularity
YEAR?
QUARTER?
MONTH?
WEEK?
DAY?
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Identifying Business Rules
Location
Geographic proximity
0 - 1 miles
1 - 5 miles
> 5 miles
Time
Month > Quarter > Year
Product
Type
Monitor
Status
PC
Server
15 inch
17 inch
19 inch
None
New
Rebuilt
Custom
Store
Store > District > Region
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Documenting Metadata
 Documenting metadata should include:
 Documenting the design process
 Documenting the development process
 Providing a record of changes
 Recording enhancements over time
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Metadata Documentation Approaches
 Automated
• Data modeling tools
• ETL tools
• End-user tools
 Manual
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Phase 2: Defining the Dimensional Model
 Identify fact tables:
• Translate business measures into fact tables
• Analyze source system information for additional
measures
 Identify dimension tables
 Link fact tables to the dimension tables
 Model the time dimension
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Star Dimensional Modeling
Product Table
Product_id
Product_desc
...
Time Table
Day_id
Month_id
Period_id
Year_id
Sales Fact Table
Product_id
Store_id
Item_id
Day_id
Sales_amount
Sales_units
...
Store Table
Store_id
District_id
...
Item Table
Item_id
Item_desc
...
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Fact Table Characteristics
 Contain numerical metrics of the business
 Can hold large volumes of data
 Can grow quickly
 Can contain base, derived,
and summarized data
Sales Fact Table
 Are typically additive
Product_id
 Are joined to dimension
Store_id
tables through foreign keys
Item_id
that reference primary
Day_id
keys in the dimension tables
Sales_amount
Sales_units
...
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Dimension Table Characteristics
 Dimension tables have the following characteristics:
 Contain textual information that represents the
attributes of the business
 Contain relatively static data
 Are joined to a fact table through
a foreign key reference
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Star Dimensional
Model Characteristics
 The model is easy for users to understand.
 Primary keys represent a dimension.
 Nonforeign key columns are values.
 Facts are usually highly normalized.
 Dimensions are completely denormalized.
 Fast response to queries is provided.
 Performance is improved by reducing table
joins.
 End users can express complex queries.
 Support is provided by many front-end tools.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Using Time in the Data Warehouse
 Defining standards for time is critical.
 Aggregation based on time is complex.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
The Time Dimension
 Time is critical to the data warehouse. A consistent
representation of time is required for extensibility.
Sales fact
Time
dimension
Where should the element of time be stored?
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Using Data Modeling Tools
 Tools with a GUI enable definition, modeling,
and reporting.
 Avoid a mix of modeling techniques caused by:
• Development pressures
• Developers with lack of knowledge
• No strategy
 Determine a strategy.
 Write and publish formally.
 Make available electronically.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Phase 3: Defining the
Physical Model
 Translate the dimensional design to a physical
model for implementation.
 Define storage strategy for tables and indexes.
 Perform database sizing.
 Define initial indexing strategy.
 Define partitioning strategy.
 Update metadata document with physical
information.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Physical Model Design Tasks
 Define naming and database standards.
 Perform database sizing.
 Develop initial indexing strategy.
 Develop data partition strategy.
 Define storage parameters.
 Set initialization parameters.
 Use parallel processing.
 Define summary data.
 Determine hardware architecture.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Database Object Naming Conventions
 Develop a reasonable list of abbreviations.
 List all the objects’ names, and work with the
user community to define them.
 Resolve name disputes.
 Document your naming standards in the
metadata document.
 Plan for the naming standards to be a living
document.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Architectural Requirements
Scalability
Manageability
Availability
Extensibility
Integration
Flexibility
User
Budget
Business
Technology
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Strategy for Architecture Definition
 Obtain existing architecture plans.
 Obtain existing capacity plans.
 Document existing interfaces.
 Prepare capacity plan.
 Prepare technical architecture.
 Document operating system requirements.
 Develop recovery plans.
 Develop security and control plans.
 Create architecture.
 Create technical risk assessment.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Hardware Requirements
 SMP
 Cluster
 MPP
 Hybrids (employing both SMP and MPP)
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Making the Right Choice
 Requirements differ from operational systems.
 Benchmark
• Available from vendors
• Develop your own
• Use realistic queries
 Scalability is important.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Storage and Performance Considerations
 Database sizing
• Test Load Sampling
 Data partitioning
• Horizontal partitioning
• Vertical partitioning
 Indexing
• B-Tree indexes
• Bitmap indexes
• Bitmap-join indexes
 Star query optimization
• Star transformation
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Database Sizing
 Sizing influences capacity planning and
systems environment management.
 Sizing is required for:
• The database
• Other storage areas
 Sizing is not an exact science.
 Techniques vary.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Test Load Sampling
 Analyze a representative sample of the data chosen
using proven statistical methods. Ensure that the
sample reflects:
 Test loads for different periods
 Day-to-day operations
 Seasonal data and worst-case scenarios
 Indexes and summaries
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Data Partitioning
 Breaking up of data into separate physical
units that can be handled independently
 Data partitioning provides ease of:
•
•
•
•
•
•
•
•
Restructuring
Reorganization
Removal
Recovery
Monitoring
Management
Archiving
Indexing
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Horizontal Partitioning
 Table and index data are split by:
•
•
•
•
•
Time
Sales region or person
Geography
Organization
Line of business
 Candidate columns appear in
a WHERE clause.
 Analysis determines requirements.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Vertical Partitioning
 You can use vertical partitioning when:
•
•
•
•
Speed of query and update actions are improved by it
Users require access to specific columns
Some data is changed infrequently
Descriptive dimension text may be better moved
away from the dimension itself
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Partitioning Methods
 Range partitioning
 List partitioning
 Hash partitioning
 Composite partitioning
• Composite range-hash partitioning
• Composite range-list partitioning
 Index partitioning
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Indexing
 Indexing is used for the following reasons:
 It is a huge cost saving, greatly
improving performance and
scalability.
 It can replace a full table scan by
a quick read of the index followed
by a read of only those disk
blocks that contain the rows
needed.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
B-Tree Index
 Most common type of indexing
 Used for high cardinality columns
 Designed for few rows returned
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Bitmap Indexes
 Provide performance benefits and storage savings
 Store values as 1s and 0s
 Use instead of B-tree indexes when:
• Tables are large
• Columns have relatively low cardinality
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Bitmap Join Indexes
 A bitmap index for the join of two or more tables:
 They are new to Oracle9i.
 They provide better performance and storage savings.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Parallelism
Sales table
Customers
table
P1
P2
P3
P1
P2
P3
Parallel Execution Servers
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Using Summary Data
 Designing summary tables offers the following
benefits:
 Provides fast access to precomputed data
 Reduces use of I/O, CPU, and memory
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Summary
 In this lesson, you should have learned how to:
 Describe Data Warehouse Environment Data
Structures
 Define the business model:
• Performing strategic analysis
• Creating the business model
• Identifying business rules
 Define the dimensional model:
• Star dimensional model characteristics
 Define the physical model:
• Physical model design tasks
• Architectural and hardware requirements
• Storage and performance considerations
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Practice 3-1 Overview
 This practice covers the following topics:
 Specifying true or false to a series of statements
 Completing a series of sentences accurately
 Practicing identifying a simple business model
 Identifying indexing method
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Lesson 4
Building the Data Warehouse:
Extracting Data
Objectives
 After completing this lesson, you should be
able to do the following:
 Outline the ETL (Extraction, Transformation,
and Loading) processes for building a data
warehouse
 Identify ETL tasks, importance, and cost
 Explain how to examine data sources
 Identify extraction techniques and methods
 Identify analysis issues and design options for
extraction processes
 List the selection criteria for the ETL tools
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Extraction, Transformation, Loading
(ETL) Processes
 Extract source data
 Transform/clean data
 Index and summarize
 Load data into warehouse
 Detect changes
 Refresh data
Operational systems
Programs
Gateways
Tools
Data Warehouse
ETL
ｎｏｙｎｏｔ@１６３．ｃｏｍ
ETL: Tasks, Importance, and Cost
Operational
systems
Extract
Clean up
Consolidate
Restructure
Load
Maintain
Refresh
Data Warehouse
ETL
Relevant
Useful
Quality
Accurate
Accessible
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Extracting Data
 Source systems
• Data from various data sources in various formats
 Extraction Routines
• Developed to select data fields from sources
• Consist of business rules, audit trails, error correction
facilities
Data mapping
Transform
Operational
databases
Data staging area
Warehouse
database
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Examining Data Sources
 Production
 Archive
 Internal
 External
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Production Data
 Operating system platforms
 File systems
 Database systems and vertical applications
IMS
DB2
Oracle
Sybase
Informix
VSAM
SAP
Shared Medical
Systems
Dun and Bradstreet
Financials
Hogan Financials
Oracle Financials
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Archive Data
 Historical data
 Useful for analysis over long periods of time
 Useful for first-time load
 May require unique transformations
Operation
databases
Warehouse
database
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Internal Data
 Planning, sales, and marketing organization data
 Maintained in the form of:
• Spreadsheets (structured)
• Documents (unstructured)
 Treated like any other source data
Planning
Marketing
Accounting
Warehouse database
ｎｏｙｎｏｔ@１６３．ｃｏｍ
External Data
 Information from outside the organization
 Issues of frequency, format, and predictability
 Described and tracked using metadata
Purchased
databases
A.C. Nielsen, IRI, IMS,
Walsh America
Dun and
Bradstreet
Competitive
information
Economic
forecasts
Barron's
Warehousing
databases
Wall Street
Journal
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Mapping Data
 Mapping data defines:
 Which operational attributes to use
 How to transform the attributes for the warehouse
 Where the attributes exist in the warehouse
File A
F1
F2
F3
123
Bloggs
10/12/56
Staging File One
Number
USA123
Name
Mr. Bloggs
DOB
10-Dec-56
Metadata
File A
F1
Staging File One
Number
F2
F3
Name
DOB
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Extraction Techniques
 Programs: C, C++, COBOL, PL/SQL, Java
 Gateways: transparent database access
 Tools:
• In-house developed tools
• Vendor’s data extraction tools
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Extraction Methods
 Logical Extraction methods:
• Full Extraction
• Incremental Extraction
 Physical Extraction methods:
• Online Extraction
• Offline Extraction
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Designing Extraction Processes
 Analysis:
• Sources, technologies
• Data types, quality, owners
 Design options:
• Manual, custom, gateway, third-party
• Replication, full, or delta refresh
 Design issues:
• Volume and consistency of data
• Automation, skills needed, resources
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Maintaining Extraction Metadata
 Source location, type, structure
 Access method
 Privilege information
 Temporary storage
 Failure procedures
 Validity checks
 Handlers for missing data
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Extraction Tools
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Selection Criteria
 Base functionality
 Interface features
 Metadata repository
 Open API
 Metadata access
 Repository utilities
 Input and output processing
 Cleansing, reformatting, and auditing
 References
 Training requirements
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Possible ETL Failures
 A missing source file
 A system failure
 Inadequate metadata
 Poor mapping information
 Inadequate storage planning
 A source structural change
 No contingency plan
 Inadequate data validation
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Maintaining ETL Quality
 ETL must be:
• Tested
• Documented
• Monitored and reviewed
 Disparate metadata must be coordinated.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Summary
 In this lesson, you should have learned how to:
 Outline the ETL (Extraction, Transformation,
and Loading) processes for building a data
warehouse
 Identify ETL tasks, importance, and cost
 Explain how to examine data sources
 Identify extraction techniques and methods
 Identify analysis issues and design options for
extraction processes
 List the selection criteria for the ETL tools
 Identify Oracle’s solution for ETL process
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Practice 4-1 Overview
 This practice covers the following topics:
 Answering a series of short questions
 Answering questions based on the business scenario
for Frontier Airways
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Lesson 5
Building the Data Warehouse:
Transforming Data
Objectives
 After completing this lesson, you should be
able to do the following:
 Define transformation
 Identify possible staging models
 Identify data anomalies and eliminate them
 Explain the importance of quality data
 Describe techniques for transforming data
 Design transformation process
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Transformation
 Transformation eliminates anomalies from
operational data:
 Cleans and standardizes
 Presents subject-oriented data
Extract
Transform:
Clean up
Consolidate
Restructure
Warehouse
Operational
systems
Load
Data Staging Area
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Possible Staging Models
 Remote staging model
 Onsite staging model
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Remote Staging Model
Data staging area within the warehouse environment
Transform
Extract
Operational
system
Load
Staging area
Warehouse
Data staging area in its own environment
Transform
Operational
system
Extract
Load
Staging area
Warehouse
ｎｏｙｎｏｔ@１６３．ｃｏｍ
On-site Staging Model
 Data staging area within the operational environment,
possibly affecting the operational system
Transform
Extract
Operational
system
Load
Staging area
Warehouse
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Data Anomalies
 No unique key
 Data naming and coding anomalies
 Data meaning anomalies between groups
 Spelling and text inconsistencies
CUSNUM
NAME
ADDRESS
90233479 Oracle Limited
100 N.E. 1st St.
90233489 Oracle Computing
15 Main Road, Ft. Lauderdale
90234889 Oracle Corp. UK
15 Main Road, Ft. Lauderdale,
FLA
90345672 Oracle Corp UK Ltd 181 North Street, Key West, FLA
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Transformation Routines
 Cleaning data
 Eliminating inconsistencies
 Adding elements
 Merging data
 Integrating data
 Transforming data before load
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Transforming Data:
Problems and Solutions
 Multipart keys
 Multiple local standards
 Multiple files
 Missing values
 Duplicate values
 Element names
 Element meanings
 Input formats
 Referential Integrity constraints
 Name and address
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Multipart Keys Problem
 Multipart keys
Product code = 12 M 654313 45
Salesperson
code
Country Sales
code
territory
Product
number
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Multiple Local Standards Problem
 Multiple local standards
 Tools or filters to preprocess
cm
DD/MM/YY
1,000 GBP
inches
MM/DD/YY
FF 9,990
cm
DD-Mon-YY
USD 600
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Multiple Files Problem
 Added complexity of multiple source files
 Start simple
Multiple
source files
Logic to detect
correct source
Transformed
data
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Missing Values Problem
 Solution:
 Ignore
 Wait
 Mark rows
 Extract when time-stamped
If NULL then
field = ‘A’
A
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Duplicate Values Problem
 Solution:
 SQL self-join techniques
 RDMBS constraint utilities
ACME Inc
ACME Inc
ACME Inc
SQL>
2
3
4
5
6
7
SELECT ...
FROM table_a, table_b
WHERE table_a.key (+)= table_b.key
UNION
SELECT ...
FROM table_a, table_b
WHERE table_a.key = table_b.key (+);
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Element Names Problem
 Solution:
Common naming
conventions
Customer
Client
Customer
Contact
Name
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Element Meaning Problem
 Avoid misinterpretation
 Complex solution
 Document meaning in metadata
Customer’s
name
All customer
details
All details
except name
Customer_detail
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Input Format Problem
EBCDIC
“123-73”
ASCII
12373
ACME Co.
áøåëéí äáàéí
Beer (Pack of 8)
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Referential Integrity Problem
 Solution:
 SQL anti-join
 Server constraints
 Dedicated tools
Department
Emp
Name
Department
10
1099
Smith
10
20
1289
Jones
20
30
1234
Doe
50
40
6786
Harris
60
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Name and Address Problem
 Single-field format
Mr. J. Smith,100 Main St., Bigtown, County Luth, 23565
 Multiple-field format
Database 1
Name
Mr. J. Smith
NAME
LOCATION
Street
100 Main St.
DIANNE ZIEFELD
N100
Town
Bigtown
HARRY H. ENFIELD
M300
Country
County Luth
Code
23565
Database 2
NAME
LOCATION
ZIEFELD, DIANNE
100
ENFIELD, HARRY H
300
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Quality Data: Importance and Benefits
 Quality data:
• Key to a successful warehouse implementation
 Quality data helps you in:
•
•
•
•
•
Targeting right customers
Determining buying patterns
Identifying householders: private and commercial
Matching customers
Identify historical data
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Data Quality Guidelines
 Operational data:
 Should not be used directly in the warehouse
 Must be cleaned for each increment
 Is not simply fixed by modifying applications
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Transformation Techniques
 Merging data
 Adding a Date Stamp
 Adding Keys to Data
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Merging Data
 Operational transactions do not usually map
one-to-one with warehouse data.
 Data for the warehouse is merged to provide
information for analysis.
Pizza sales/returns by day, hour, seconds
Sale
1/2/02
12:00:01
Ham Pizza
$10.00
Sale
1/2/02
12:00:02
Cheese Pizza
$15.00
Sale
1/2/02
12:00:02
Anchovy Pizza
$12.00
Return 1/2/02
12:00:03
Anchovy Pizza
- $12.00
Sale
12:00:04
Sausage Pizza
$11.00
1/2/02
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Merging Data
Pizza sales/returns by day, hour, seconds
Sale
1/2/02
12:00:01
Ham Pizza
$10.00
Sale
1/2/02
12:00:02
Cheese Pizza
$15.00
Sale
1/2/02
12:00:02
Anchovy Pizza
$12.00
Return 1/2/02
12:00:03
Anchovy Pizza
- $12.00
Sale
12:00:04
Sausage Pizza
$11.00
1/2/02
Pizza sales
Sale
1/2/02
12:00:01
Ham Pizza
$10.00
Sale
1/2/02
12:00:02
Cheese Pizza
$15.00
Sale
1/2/02
12:00:04
Sausage Pizza
$11.00
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Adding a Date Stamp
 Time element can be represented as a:
• Single point in time
• Time span
 Add time element to:
• Fact tables
• Dimension data
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Adding a Date Stamp:
Fact Tables and Dimensions
Product Table
Product_id
Time_key
Product_desc
Time Table
Week_id
Period_id
Year_id
Time_key
Store Table
Store_id
District_id
Time_key
Sales Fact Table
Item_id
Store_id
Time_key
Sales_dollars
Sales_units
Item Table
Item_id
Dept_id
Time_key
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Adding Keys to Data
#1
Sale
1/2/98
12:00:01 Ham Pizza
$10.00
#2
Sale
1/2/98
12:00:02 Cheese Pizza
$15.00
#3
Sale
1/2/98
12:00:02 Anchovy Pizza $12.00
#4
Return 1/2/98
12:00:03 Anchovy Pizza - $12.00
#5
Sale
12:00:04 Sausage Pizza $11.00
1/2/98
Data values
or artificial keys
#dw1
Sale
1/2/98
12:00:01 Ham Pizza
$10.00
#dw2
Sale
1/2/98
12:00:02 Cheese Pizza
$15.00
#dw3
Sale
1/2/98
12:00:04 Sausage Pizza $11.00
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Summarizing Data
1. During extraction on staging area
2. After loading to the warehouse server
Operational
databases
Staging area
Warehouse
database
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Maintaining Transformation Metadata
 Transformation metadata contains:
 Transformation rules
 Algorithms and routines
Sources
Stage
Rules
Extract
Transform
Publish
Load
Query
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Maintaining Transformation Metadata
 Restructure keys
 Identify and resolve coding differences
 Validate data from multiple sources
 Handle exception rules
 Identify and resolve format differences
 Fix referential integrity inconsistencies
 Identify summary data
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Data Ownership and Responsibilities
 Data ownership and responsibilities should be
shared by the:
• Operational team
• Data warehouse team
 Business benefit gained with “work together”
approach
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Transformation Timing and Location
 Transformation is performed:
• Before load
• In parallel
 Can be initiated at different points:
• On the operational platform
• In a separate staging area
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Choosing a Transformation Point
 Workload
 Impact on environment
 CPU usage
 Disk space
•
•
•
•
Network bandwidth
Parallel execution
Load window time
User information needs
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Monitoring and Tracking
 Transformations should:
 Be self-documenting
 Provide summary statistics
 Handle process exceptions
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Designing Transformation Processes
 Analysis:
• Sources and target mappings, business rules
• Key users, metadata, grain
 Design options:
•
•
•
•
Third-party tools
Custom 3GL programs(FORTRAN、Cobol、C、C++、JAVA )
4GLs like SQL or PL/SQL
Replication
 Design issues:
• Performance
• Size of the staging area
• Exception handling, integrity maintenance
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Transformation Tools
 Third-party tools
 SQL*Loader
 In-house developed programs
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Summary
 In this lesson, you should have learned how
to:
 Define transformation
 Identify possible staging models
 Identify data anomalies and eliminate them
 Explain the importance of quality data
 Describe techniques for transforming data
 Design transformation process
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Practice 5-1 Overview
 This practice covers the following topics:
 Answering a series of questions based on the
business scenario for Frontier Airways
 Answering a series of short questions
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Lesson 6
Building the Data Warehouse:
Loading Warehouse Data
Objectives
 After completing this lesson, you should be able to
do the following:
 Explain key concepts in loading warehouse data
 Outline how to build the loading process for the initial
load
 Identify loading techniques
 Describe the loading techniques provided by Oracle
 Identify the tasks that take place after data is loaded
 Explain the issues involved in designing the
transportation, loading, and scheduling processes
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Loading Data into the Warehouse
 Loading moves the data into the warehouse
 Loading can be time-consuming:
• Consider the load window
• Schedule and automate the loading
 Initial load moves large volumes of data
 Subsequent refresh moves smaller volumes of
data
Transform
Extract
Operational
databases
Transport,
Load
Staging area
Warehouse
database
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Initial Load and Refresh
 Initial Load:
 Single event that populates the database with
historical data
 Involves large volumes of data
 Employs distinct ETL tasks
 Involves large amounts of processing after load
 Refresh:
 Performed according to a business cycle
 Less data to load than first-time load
 Less-complex ETL tasks
 Smaller amounts of post-load processing
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Data Refresh Models: Extract
Processing Environment
 After each time interval, build a new snapshot of the
database.
 Purge old snap shots.
Operational
databases
T1
T2
T3
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Data Refresh Models: Warehouse
Processing Environment
 Build a new database.
 After each time interval, add changes to database.
 Archive or purge oldest data.
Operational
databases
T1
T2
T3
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Building the Loading Process
 Techniques and tools
 File transfer methods
 The load window
 Time window for other tasks
 First-time and refresh volumes
 Frequency of the refresh cycle
 Connectivity bandwidth
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Building the Loading Process
 Test the proposed technique
 Document proposed load
 Monitor, review, and revise
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Data Granularity
 Important design and operational issue
 Low-level grain:
Expensive, high level of processing,
more disk space, more details
 High-level grain:
Cheaper, less processing, less
disk space, little details
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Loading Techniques
 Tools
 Utilities and 3GL
 Gateways
 Customized copy programs
 Replication
 FTP
 Manual
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Loading Technique Considerations
 Tools are comprehensive, but costly.
 Data-movement utilities are fast and powerful.
 Gateways are suitable for specific instances:
•
•
•
•
Access other databases
Supply dependent data marts
Support a distributed environment
Provide real-time access if needed
 Use customized programs as a
last resort.
 Replication is limited by
data-transfer rates.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Post-Processing of Loaded Data
Transform
Extract
Load
Staging area
Create
indexes
Warehouse
Generate
keys
Post-processing
of loaded data
Summarize
Filter
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Indexing Data
 Before load:
Enable indexes at server
 During load:
Adds time to load window, row-by-row approach
 After load:
Adds time to load window, but faster than row-by- row
approach
Index
Operational
databases
Staging
area
Warehouse
database
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Unique Indexes
 Disable constraints before load.
 Enable constraints after load.
 Re-create index if necessary.
Disable
constraints
Enable
constraints
Load data
Create index
Catch
errors
Reprocess
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Creating Derived Keys
 The use of derived or generalized keys is
recommended to maintain the uniqueness of a
row.
 Methods:
• Concatenate operational key with a number
• Assign a number sequentially from a list
109908
109908 01
109908
100
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Summary Management
 Summary tables
 Materialized views
Summary data
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Filtering Data
 From warehouse to data marts
Summary data
Warehouse
Data marts
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Verifying Data Integrity
 Load data into intermediate file.
 Compare target flash totals with totals before load.
Counts &
Amounts
=
Flash
Totals
Load
Intermediate file
Target
Counts &
Amounts
Flash
Totals
=
Preserve, inspect,
fix, then load
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Steps for Verifying Data Integrity
Source files
Source files
Source files
Control
Target
3
4
1
Extract
SQL*Loader
5
2
6
7
.log
.bad
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Standard Quality Assurance Checks
 Load status
 Completion of the process
 Completeness of the data
 Data reconciliation
 Referential integrity violations
 Reprocessing
 Comparison of counts and amounts
1+1=3
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Summary
 In this lesson, you should have learned how to:
 Explain key concepts in loading data into the
warehouse
 Outline how to build the loading process for the
initial load
 Identify loading techniques
 Describe the loading techniques provided by
Oracle
 Identify the tasks that take place after data is
loaded
 Explain the issues involved in designing the
transportation, loading, and scheduling
processes
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Practice 6-1 Overview
 This practice covers the following topics:
 Answering a series of short questions
 Answering questions based on the business scenario
for Frontier Airways
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Lesson 7
Refreshing Warehouse Data
Objectives
 After completing this lesson, you should be
able to do the following:
 Describe methods for capturing changed data
 Explain techniques for applying the changes
 Describe Change Data Capture mechanism and
refresh mechanisms supported in Oracle9i
 Describe techniques for purging and archiving
data and outline the techniques supported by
Oracle
 Outline final tasks, such as publishing the data,
controlling access, and automating processes
 List the selection criteria for choosing ETL tools
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Developing a Refresh Strategy
for Capturing Changed Data
 Consider load window.
 Identify data volumes.
 Identify cycle.
 Know the technical infrastructure.
 Plan a staging area.
 Determine how to detect changes.
Operational
databases
T1
T2
T3
ｎｏｙｎｏｔ@１６３．ｃｏｍ
User Requirements and Assistance
 Users define the refresh cycle.
 IT balances requirements against technical issues.
 Document all tasks and processes.
 Employ user skills.
Operational
databases
T1
T2
T3
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Load Window Requirements
 Time available for entire ETL process
 Plan
 Test
 Prove
 Monitor
Load Window
0
3 am
6
User Access Period Load Window
9
12 pm
3
6
9
12
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Planning the Load Window
 Plan and build processes according to a strategy.
 Consider volumes of data.
 Identify technical infrastructure.
 Ensure currency of data.
 Consider user access requirements first.
 High availability requirements may mean a small load
window.
User Access Period
0
3 am
6
9
12 pm
3
6
9
12
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Scheduling the Load Window
1
Receive data
FTP
0
2
Requirements
Load cycle
Control File
File names
File types
Number of files
Number of loads
First-time load or refresh
Date of file
Date range
Records in file - counts
Totals – amounts
4
Control process
Open and read
files to verify
and analyze
3
3 a.m.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Scheduling the Load Window
5
Load into
warehouse
6
8
Verify,
analyze,
reapply
Create
summaries
7
Index
data
9
Update
metadata
Parallel
load
3 a.m.
6 a.m.
9 a.m.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Scheduling the Load Window
11
10
Backup
warehouse
6 a.m.
Create
views for
specialized
tools
12
Users
access
summary
data
13
Publish
9 a.m.
User
access
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Capturing Changed Data for Refresh
 Capture new fact data
 Capture changed dimension data
 Determine method of capture in each case
 Methods:
•
•
•
•
•
Wholesale data replacement
Comparison of database instances
Time stamping
Database triggers
Database log
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Wholesale Data Replacement
 Expensive
 Useful for data marts with less data
 Limited historical data analysis is possible
 Time period often exceeds load window
 Mirroring techniques can be used to provide
access to the users
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Comparison of Database Instances
 Delta file:
• Changes to operational data since last refresh
• Used to update the warehouse
 Simple to perform, but expensive in terms of time and
processing
 Efficient for smaller volumes of data
Yesterday’s
operational
database
Today’s
operational
database
Database
comparison
Delta file holds
changed data
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Time and Date Stamping
 Fast scanning for records changed since last refresh
cycle
 Useful for data with updated date field
 No detection of deleted data
Operational
data
Delta file holds
changed data
based on time stamp
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Database Triggers
 Changed data intersected at the server level
 Extra I/O required
 Maintenance overhead
Operational
data
Operational
server
(RDBMS)
Delta file holds
changed data
Triggers on server
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Using a Database Log
 Contains before and after images
 Requires system checkpoint
 Common technique
Operational
data
Operational
server
(DBMS)
Log
Log analysis
and
data extraction
Delta file holds
changed data
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Choosing a Method
for Change Data Capture
 Consider each method on merit.
 Consider a hybrid approach if one approach is not
suitable.
 Consider current technical, operational, and
application issues.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Applying the Changes to Data
 You have a choice of techniques:
 Overwrite a record
 Add a record
 Add a field
 Maintain history
 Add version numbers
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Overwriting a Record
 Easy to implement
 Loses all history
 Not recommended
42135
John Doe Married 42135
John Doe Single
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Adding a New Record
 History is preserved; dimensions grow.
 Time constraints are not required.
 Generalized key is created.
 Metadata tracks usage of keys.
42135
42135_01
John Doe
John Doe
Single
Married
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Adding a Current Field
 Maintains some history
 Loses intermediate values
 Is enhanced by adding an Effective Date field
42135
John Doe Single
42135
John Doe Single
Married
1-Jan-01
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Limitations of Methods
for Applying Changes
 Difficult to maintain History
 Dimensions may grow large
 Maintenance overhead
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Maintaining History: Techniques
 History tables
 One-to-many relationships
 Versioning
 Preserve complete history
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Maintaining History: Techniques
 History tables:
 Normalize dimensions
 Hold current and historical data
 One-to-many relationships:
 One current record and many history records
HIST_CUST
Time
CUSTOMER
Sales
Product
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Versioning
 Avoid double counting
 Facts hold version number
Time
Customer.CustId
Version
Customer Name
1234
1
Comer
1234
2
Comer
Sales.CustId
Version
Sales Facts
1234
1
$11,000
1234
2
$12,000
Customer
Sales
Product
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Preserve Complete History
 Complete history:
• Enables realistic historical analysis
• Retains context of data
 Model must be able to:
• Reflect business changes
• Maintain context between fact and dimension data
• Retain sufficient data to relate old to new
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Purging and Archiving Data
 As data ages, its value depreciates.
 Remove old data from the warehouse:
• Archive for later use (if needed)
• Purge without copy
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Final Tasks
 Update metadata
 Publish data
 Use database roles to control access to the
warehouse
Sources
Stage
Rules
Publish
Extract
Transform
Load
Query
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Publishing Data
 Control access using database roles
 Compromise between load action and user access
 Consider:
• Staggering updates
• Using temporary tables
• Using separate tables
ｎｏｙｎｏｔ@１６３．ｃｏｍ
ETL Tools: Selection Criteria
 Overlap with existing tools
 Availability of meta model
 Supported data sources
 Ease of modification and maintenance
 Required fine tuning of code
 Ease of change control
 Power of transformation logic
 Level of modularization
 Power of error, exception, resubmission features
 Intuitive documentation
 Performance of code
ｎｏｙｎｏｔ@１６３．ｃｏｍ
ETL Tool Selection Criteria
 Activity scheduling and sophistication
 Metadata generation
 Learning curve
 Flexibility
 Supported operating systems
 Cost
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Summary
 In this lesson, you should have learned how to:
 Describe methods for capturing changed data
 Explain techniques for applying the changes
 Describe Change Data Capture mechanism and
refresh mechanisms supported in Oracle9i
 Describe techniques for purging and archiving
data and outline the techniques supported by
Oracle
 Outline final tasks, such as publishing the data,
controlling access, and automating processes
 List the selection criteria for choosing ETL tools
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Practice 7-1 Overview
 This practice covers the following topics:
 Answering a series of questions based on the
business scenario for Frontier Airways
 Answering a series of short questions
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Lesson 8
Leaving a Metadata Trail
Objectives
 After completing this lesson, you should be
able to do the following:
 Define warehouse metadata, its types, and its
role in a warehouse environment
 Examine each type of warehouse metadata
 Develop a metadata strategy
 Outline the Common Warehouse Metamodel
(CWM)
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Defining Warehouse Metadata
 Data about warehouse data and processing
 Vital to the warehouse
 Used by everyone
 The key to understanding warehouse information
Metadata
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Metadata Users
End users
Metadata
repository
Developers
IT Professionals
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Types of Metadata
 End-user metadata:
• Key to a good warehouse
• Navigation aid
• Information provider
 ETL metadata:
•
•
•
•
Maps structure
Source and target information
Transformations
Context
 Operational metadata:
• Load, management, scheduling processes
• Performance
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Examining Types of Metadata
 ETL metadata
 End-user metadata
Metadata
repository
External
sources
Operational
data sources
ETL
End
user
Warehouse
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Examining Metadata: ETL Metadata
 Business rules
 Source tables, fields, and key values
 Ownership
 Field conversions
 Encoding and reference table
 Name changes
 Key value changes
 Default values
 Logic to handle multiple sources
 Algorithms
 Time stamp
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Extraction Metadata
 Space and storage requirements
 Source location information
 Diverse source data
 Access information
 Security
 Contacts
Extraction
 Program names
 Frequency details
 Failure procedures
 Validity checking information
Metadata
repository
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Transformation Metadata
 Duplication routines
 Exception handling
 Key restructuring
 Grain conversions
 Program names
 Frequency
 Summarization
Metadata
repository
Transformation
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Loading Metadata
 Method of transfer
 Frequency
 Validation procedures
 Failure procedures
 Deployment rules
 Contact information
Metadata
repository
Loading
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Examining Metadata: End-User Metadata
 Location of fact and dimensions
 Availability
 Description of contents and algorithms used for
derived and summary data
Metadata
 Data ownership details
repository
End users
ｎｏｙｎｏｔ@１６３．ｃｏｍ
End-User Metadata: Context
 Need to know the context of the table queried
 Associate the metadata with its description
Metadata
repository
End users
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Example of End-User Metadata
Table
Name
Column
Name
Data
Meaning
Product
Prod_ID
739516
Unique identifier for the
product
Product
Valid_date
01/97
Last refresh date
Product
Ware_loc
1816
Warehouse location number
Product
Ware_bin
666
Warehouse bin number
Product
Code
15
The color of the product;
please refer to table
COL_REF for details
Product
Weight
17.62
Packed shipping weight in
kilograms
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Historic Context of Data
 Supports change history
 Maintains the context of information
Metadata
repository
End users
1994 1995 1996 1997 1998
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Types of Context
 Simple:
• Data structures
• Naming conventions
• Metrics
Metadata
repository
 Complex:
• Product definitions
• Markets
• Pricing
 External:
• Economic
• Political
End users
1994 1995 1996 1997 1998
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Developing a Metadata Strategy
 Define a strategy to ensure high-quality
metadata useful to users and developers.
 Primary strategy considerations:
•
•
•
•
•
•
•
•
Define goals and intended use
Identify target users
Choose tools and techniques
Choose the metadata location
Manage the metadata
Manage access to the metadata
Integrate metadata from multiple tools
Manage change
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Defining Metadata Goals
and Intended Usage
 Define clear goals.
 Identify requirements.
 Identify intended usage.
Metadata
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Identifying Target Metadata Users
 Who are the metadata users?
• Developers
• End users
 What information do they need?
 How will they access the metadata?
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Choosing Metadata Tools and
Techniques
 Tools:
• Data modeling
• ETL
• End user (query and analysis)
 Database schema definitions
 COBOL copybooks
 Middleware tools
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Choosing the Metadata Location
 Usually the warehouse server
 Possibly on operational platforms
 Desktop tool with metalayer
Metadata
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Managing the Metadata
 Managed by the metadata manager
 Maintained by the metadata architect
 Standards should be followed
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Integrating Multiple Sets of Metadata
 Multiple tools may generate their own metadata.
 These metalayers should be properly integrated.
 Metadata exchangeability is desirable.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Managing Changes to Metadata
 Different types of metadata have different rates of
change.
 Consider metadata changes resulting from refresh
cycles.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Additional Metadata Content
and Considerations
 Summarization algorithms
 Relationships
 Stewardship
 Permissions
 Pattern analysis
 Reference tables
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Common Warehouse Metamodel
Design and Administration
Analytic applications
Any source
ERP
Operational
External
Warehouse
Data
integration
Information
delivery
Any access
Reporting
Ad hoc query
& analysis
Data mining
Marts
CWM metadata repository
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Summary
 In this lesson, you should have learned how
to:
 Define warehouse metadata, its types, and its
role in a warehouse environment
 Examine each type of warehouse metadata
 Develop a metadata strategy
 Outline the Common Warehouse Metamodel
(CWM)
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Practice 8-1 Overview
 This practice covers the following topics:
 Answering a series of short questions
 Answering questions based on the business scenario
for Frontier Airways
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Lesson 9
Managing and Maintaining
the Data Warehouse
Objectives
 After completing this lesson, you should be able to
do the following:
 Develop a plan for managing the transition from
development to implementation
 Identify challenges pertaining to the growth of the data
warehouse
 Describe backup and archive mechanisms
 Identify data warehouse performance issues
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Managing the Transition to Production
 Promoting support for change
 Pilot versus large-scale implementation
 Documentation
 Testing
 Training
 Post-implementation support
 Maintaining the warehouse
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Promoting Support for the
Data Warehouse
 Awareness
 Feedback
 Information
 Skills
 Education
 Direction
 Control
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Choosing Between Pilot and
Large-Scale Implementation
Pilot
Large-Scale
Implementation
ｎｏｙｎｏｔ@１６３．ｃｏｍ
The Warehouse Pilot
 Demonstrates benefits to:
• Management
• Users
• IT staff
 Relevant to the business
 Low technical risk
 Small and feasible
 Anticipates increased use
 Focused on an initial business issue
 Remains in context
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Piloting the Warehouse
 Designers:
• Prove model, data, and access tools
 Users:
• Prove ease of use of tool
• Check data and query performance
• Identify training requirements
 Developers:
• Resolve ETL and metadata issues
• Determine users data and training requirements
• Test security and access levels, monitor performance
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Documentation
 Produces textual deliverables:
 Glossary
 User and technical documentation
 Online help
 Metadata reference guide
 Warehouse management reference
 New features guide
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Testing the Warehouse
 Test every stage.
 Use a realistic test database and environment.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Training
 Users:
•
•
•
•
•
Metadata
DSS tools
Ad hoc queries
Getting help
Registration of enhancement requests
 Information systems developers:
•
•
•
•
Analysis techniques
Hardware technicalities
Networking
Implementing, building, and supporting DSS
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Post-Implementation Support
 Evaluate and review the implementation.
 Monitor the warehouse:
• Respond to problems
• Conduct performance tuning
• Roll out metadata, queries, reports, filters, and
conditions
• Implement security
• Incorporate new users
• Distribute data marts and catalogs
• Transfer ownership from IT
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Monitoring the Success
of the Data Warehouse
Number of Users
Initial
3
Months
6
Months
12
Months
24
Months
Period After Implementation
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Measuring the Success
of the Data Warehouse
 Metrics may include:
 Availability
 Response time
 Response to problems
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Managing Growth
 Increasing number of users
 Broader usage
 Growth of data volumes
Period after Implementation
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Expansion and Adjustment
 Evaluate continually:
•
•
•
•
Changes
New increments
Unnecessary components
Strategies
 Ensure open environment
 Document development processes
for the future:
•
•
•
•
Planning
Cost analysis
Problem assessment and correction
Performance assessment
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Controlling Expansion
 Ensure the continuity of staff.
 Document processes, solutions, and metrics.
 Establish working test and production
architecture for further increments.
 Create a strategy for maintaining changes to
data.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Sizing Storage
 Consider different methods.
 Determine the best for your needs.
 Know the business requirements.
 Do not underestimate requirements.
 Plan for growth.
 Consider space for unwanted data.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Estimating Storage
 Fact volumes
 Fact lifetime
 Technology availability
 Technology purchase
 Storing presummarized data
 Mirroring or other techniques
requiring disk storage
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Objects That Need Space
 ODS
 Indexes and metadata
 Summary data
 Redo logs
 Rollback information
 Sort areas
 Temporary space
 Workspace for backup
and recovery
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Other Considerations and Techniques
 Queuing models
 Rule of thumb
 Total database size is three to four times the size of
the base fact tables
 Consider:
•
•
•
•
•
Sparseness
Dimensions
Indexes
Summaries
Sort operational space
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Space Management
 Monitor
 Avoid fragmentation
 Test load data
 Plan for growth
 Know business patterns
 Never let space become an issue
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Archiving Data
 Determine data life expectancy.
 Identify archive frequency.
 Use read-only tablespaces.
 Include in early specifications.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Purging Data
 Reduce data volumes:
• Create summaries
• Remove unwanted base data
 Choose the most effective method.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Identifying Data Warehouse
Performance Issues
 Improving query efficiency:
•
•
•
•
Use indexes.
Use query governors.
Run large jobs out of hours.
Consider a data mart approach.
 Improving network performance:
• Provide sufficient bandwidth and optimize
configuration for access.
• Analyze traffic.
• Deploy data marts at remote locations.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Review and Revise
 Monitor the warehouse:
 Usage
 Access
 Accurate grain
 Detail data
 Periodicity
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Secret of Success
Think big; start small!
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Course Summary
 In this course, you should have learned that
the successful warehouse:
 Is driven by the business
 Focuses on objectives
 Adds value to the business
 Can be understood and used
 Delivers good data
 Performs well
 Belongs to the users
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Sample study
Note : we discuss the creation of data marts, rather than
the perhaps more familiar term, data warehouse.
Data warehouses tend to be large, one-stop-shopping
repositories where all the historical data for the
organization would be stored. Nothing is wrong with this
as a concept; however, attempting to create a data
warehouse often led to huge, multiyear technology
projects that were never quite finished or were outdated
when they finally did get done.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
Data Mart Structure
 Measures
A Measure is a numeric quantity expressing some
aspect of the organization's performance. The
information represented by this quantity is used to
support or evaluate the decision making and
performance of the organization. A measure can also be
called a fact.
the tables that hold measure information are known as
fact tables.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
 Dimensions
A Dimension is a categorization used to spread out an
aggregate measure to reveal its constituent parts.
Dimensions are used to facilitate this slicing and dicing.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
ｎｏｙｎｏｔ@１６３．ｃｏｍ
ｎｏｙｎｏｔ@１６３．ｃｏｍ
ｎｏｙｎｏｔ@１６３．ｃｏｍ
 The Star Schema
A Star Schema is a relational database schema used to
hold measures and dimensions in a data mart. The
measures are stored in a fact table and dimensions are
stored in dimension tables.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
ｎｏｙｎｏｔ@１６３．ｃｏｍ
ｎｏｙｎｏｔ@１６３．ｃｏｍ
ｎｏｙｎｏｔ@１６３．ｃｏｍ
 Attributes
An Attribute is an additional piece of information
pertaining to a dimension member that is not the unique
identifier or the description of the member.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
ｎｏｙｎｏｔ@１６３．ｃｏｍ
 Hierarchies
A Hierarchy is a structure mode up of two or more levels
of related dimensions. A dimension at an upper level of
the hierarchy completely contains one or more
dimensions from the next lower level of the hierarchy.
ｎｏｙｎｏｔ@１６３．ｃｏｍ
ｎｏｙｎｏｔ@１６３．ｃｏｍ
 The Snowflake Schema
ｎｏｙｎｏｔ@１６３．ｃｏｍ

幻灯片 1 - 这是一个测试

Transcript 幻灯片 1 - 这是一个测试

Directory