An overview of Data Warehousing and OLAP Technology
Download
Report
Transcript An overview of Data Warehousing and OLAP Technology
An overview of Data Warehousing and
OLAP Technology
Presented By
Manish Desai
•
•
•
•
•
•
•
•
•
•
•
•
•
Introduction
What is data warehouse ?
Explanation of definition
Data warehouse Vs. Operational Database
Data warehouse architecture
Back end tools
Conceptual model
Database design
Warehouse servers
Index structures
Meta data
Conclusion
References
2
Introduction
• Essential elements of decision support
• Enables The Knowledge Worker to make better
and faster decisions
• Used in many industries like:
– Manufacturing (for order shipment)
– Retail (for inventory management)
– Financial Services (claims and risk analysis)
• Every major database vendor offers product in this
area
3
What is Data Warehouse ?
• A data warehouse is a “subject-oriented,
integrated, time-varying, non-volatile collection of
data that is used primarily in organizational
decision making”
• Typically maintained separately from operational
databases
4
Explanation of definition
• Subject-Oriented:
– Designed around subject such as customer, vendor,
product and activity
– Does not includes data that are not needed for Decision
support system (DSS)
• Integrated:
– Most important feature
– Consistent naming convention, measurement of
variables and so forth
– The data should be stored in single globally acceptable
fashion
5
Explanation (continues…)
• Time Varying:
– All data in the warehouse should be accurate as of some
moment in time
– Data stored over a long time horizon (5 –10 years)
– Key structure contains element of time (implicitly or
explicitly)
– Data once correctly recorded cant be updated
• Non Volatile:
– No Update of data allowed
– only loading and access of data operations
6
Data Warehouse Vs. Operational
Database
Data Warehouse
Operational Database
user
Knowledge worker
Clerk, IT professional
Function
Decision support
Day to day operations
Data
Historical,summarized,
multidimensional,
integrated
Current, up-to-date, detailed
Unit of work
Complex query
Short, simple transaction
metric
Query throughout,
response
Transaction throughput
7
Architecture
•
•
•
•
•
Data sourcing,migration,cleanup tools
Meta data repository
Data marts
Data query, reporting, analysis and mining tools
Data warehouse administration and management
8
Architecture (continues…)
• Distributed Data warehouse
– Load balancing, scalability,higher availability
– Meta data replicated and centrally administrated
– Too expansive
• Data marts
– Departmental subset focused on selected subjects
– example: marketing department includes customer,
sales and product tabels
– Has own repository and administration
– May lead to complex integration problems if not
designed properly
9
Back end tools and Utilities
• Data cleaning, loading, refreshing tools
• Cleaning
– Multiple source, possibility of errors
– Example: replace string sex by gender
• Loading
– Building indices, sorting and making access paths
– Large amount of data
• Incremental loading
• Only updated tuples are inserted ,Process hard to manage
• Refresh
– Propagating updates
– When to refresh ?
– Set by administrator depending on user needs and traffic
10
Conceptual Model and front end tools
• Multi dimensional view
–
–
–
–
Dimensions together uniquely determine the measure
Example: Sales can be represented as city,product, data
Each dimension is described by set of attribute
Example: product consist of
• Category of product
• Industry of product
• Year of introduction
• Front end tools
– Multi dimensional spreadsheet
• Supports Pivoting-reorientation
• Roll_up - summarized data
• Drill_down - go from high level to low level summary
11
Database design
• Two ways to represent Multi dimensional model
– Star schema
• Database consist of single fact table and single table for each
dimension
• Each tuples in fact table consist of pointer to each of dimension
– Snowflake schema
• Refinement over star schema
• Dimensional hierarchy is explicitly represented by normalizing
dimension tables
12
Warehouse Servers
• Specialized SQL servers
– Provides advanced query language and query
processing support for SQL queries over star and
snowflake schemas
– Example: Redbrick
• ROLAP
– Between relational back end and client front end tools
– Extend traditional relational servers to support
multidimensional queries
– Example: Microstratergy
• MOLAP
– Multidimensional storage engine
– Direct mapping
– Example: Essbase from Arbor Inc.
13
Index structures
• Bit map indices
– Use single bit to indicate specific value of attribute
– Example:
instead of storing eight characters to record “engineer” as skill
of employee use single bit
id# Name Skill
1000 John 1
• Join indices
– Maintains the relationship between foreign key with its
matching primary keys
14
Meta data and warehouse management
• Its data about data
• Used for building, maintain, managing and using
data warehouse
• Administrative meta data
– Information about setting up and using warehouse
• Business meta data
– Business terms and definition
• Operational meta data
– Information collected during operation of warehouse
15
Conclusion
• Data warehouse is the technology for the future.
• data warehouse enables knowledge worker to
make faster and better decisions
16
References
•
•
•
•
•
Inmon W. H.,Building the data warehouse
www.olapcouncil.org
www.pwp.starnetinc.com
www.arborsoft.com
Kimball, R. The data warehouse toolkit.
17