slides (Powerpoint)

Download Report

Transcript slides (Powerpoint)

I
ManTech Advanced Systems International, Inc.
Data Warehousing:
Jeffrey T. Edgell
Assistant Executive Director
ManTech Advanced Systems
International
Decision Support
In order to make correct decisions, accurate,
meaningful information about business
environments, external issues, and internal
workings must be available in a timely
fashion.
A Need For New Technology
• Government and industrial entities have
been collecting data in electronic format
since the 1960s.
• Today, organizations collect millions of
pieces of information about every aspect of
their operation on a daily basis.
• Data is obtained from multiple disparate
sources.
A Need For New Technology
• Often information is replicated, leading to
confusion.
• Related data is often retained in seemingly
heterogeneous and incompatible platforms.
• Common data attributes are represented in
nonstandard formats and naming constructs
across systems.
A Need for New Technology
• Most systems are built for data collection
(transaction based).
• Designed to support On-Line Transaction
Processing (OLTP).
• Designed to support day-to-day business
operations.
• Very specific applications built to support
interaction with the data.
A Need for New Technology
• Perform best when handling small specific
volumes of data.
• Does not accept information from dissimilar
sources readily.
• Are not constructed to handle analysis of
large amounts of data efficiently.
A Need for New Technology
• Capable of answering questions of a
specific nature and time frame.
– How many items do I have in stock today?
– How many tickets were sold on a specific date?
– What is the current price of an item?
A Need for New Technology
• Transaction based systems experience great
difficulty in answering analytical and
decision support questions.
• Analysis takes a long time, interfering with:
– transaction performance
– daily operations
• The nature of the data is dynamic and
dispersed.
A Need for New Technology
Most organizations have created a “spider
web” of systems and data sources.
A Need for New Technology
All of this has created “data overload” and
“data confusion”.
–
–
–
–
What do I do with all of this data?
What does it mean?
Do I really need this data?
I am overwhelmed with the amount of data I
am confronted with.
– I cannot make a timely decision (too much data
from too many sources).
Data Warehousing 101
Data warehousing is:
A large historical database designed to accept
key analytical data from multiple and
disparate sources that manage the day-to-day
management of enterprise data. Furthermore,
the role of the warehouse is to transform
transaction data into corporate information.
The warehouse is provided in a read-only
fashion to a user.
Data Warehousing 101
A data warehouse will provide:
The ability to ask business analysis questions
in a real-time, iterative fashion, obtaining
decision support information readily and
quickly.
Data Warehousing 101
• A data warehouse is not:
A repository for all corporate data.
• A data warehouse will not:
Single handedly solve all of the problems
associated to an enterprise.
Data Warehousing 101
Key components include the following:
– data model
– data storage architecture (relational,
proprietary)
– data access/replication/transport
– data transformation and scrubbing
– staging and publication
– metadata
– warehouse hardware and software
Data Warehousing 101
Data Warehouse Architecture
Information Directory Repository
Legacy Data
Data
Transformation
Data Warehouse
Data Warehouse Management Layer
Legacy Data
External Data Source
Data Warehousing 101
How a warehouse deals with aging data:
1993 Qtrly
Data
1994 Qtrly
Data
1995 Qtrly
Data
1996 Month
Data
1997 Month
Data
Aggregation
1998 Detail
Data
Data Warehousing 101
Model concepts:
– Fact table(s)
• A table containing multiple measurable
descriptors relating to a specific area of business
• Each fact can be viewed, calculated, and
aggregated against various defining areas of the
business (time, geography, customer)
Data Warehousing 101
Model concepts:
– Dimension Table(s)
• Retains information (product description,
geography description, customer description)
that is descriptive and remains moderately
constant over time
Data Warehousing 101
Data Warehouse Modeling
– Special modeling techniques must be applied to
provide rapid response of queries on large
volumes of data.
– OLTP systems are built with update operations
in mind, resulting in normalization and greatly
reduced browse performance.
Data Warehousing 101
Common data model techniques are as
follows:
–
–
–
–
star schema
snowflake
fact constellation
relational
Data Warehousing 101
Sample Star Schema Model
TIME
Dimensions
STORE
GEOGRAPHY
SALES
Sales Facts
Dimensions
CUSTOMER
Data Warehousing 101
Year
North
Sample Snowflake Model
Qtr
South
Month
TIME
GEOGRAPHY
East
West
Dimensions
SALES
Dimensions
East Region
STORE
West Region
Sales Facts
CUSTOMER
Data Warehousing 101
Sample Fact Constellation Model
TIME
Dimensions
Regional
Sales
GEOGRAPHY
District
Sales
Dimensions
Store
Sales
STORE
CUSTOMER
Data Marting 101
Data marting is:
A functional segment of an enterprise
restricted for purposes of security, locality,
performance, or business necessity using
modeling and information delivery techniques
identical to data warehousing.
Data Marting 101
Why build a data mart?
– Allows an organization to visualize the large
but focus on the small and attainable.
– Provides a platform for rapid delivery of an
operational system.
– Minimizes risk.
– A corporate warehouse can be constructed from
the union of the enterprise data marts.
Data Marting 101
Data From
Transaction Sources
The data warehouse
populates
the data marts.
Financial
Data Mart
Data
Warehouse
Update From the
Warehouse
Logistics
Data Mart
Contract
Data Mart
Data Marting 101
The data marts populate
the data warehouse.
Data
Warehouse
Financial
Data Mart
Logistics
Data Mart
Update From the
Data Marts
Contract
Data Mart
Data From
Transaction Sources
Data Marting 101
Virtual Data Warehouse
Data is moved through the
abstract layer on demand.
Abstract Data Warehouse
Access Layer
The data warehouse layer
manages the data marts
Financial
as a warehouse.
Data Mart
Logistics
Data Mart
Contract
Data Mart
Data From
Transaction Sources
OLAP 101
• OLAP is a powerful graphics-oriented tool
used to access the data warehouse
• OLAP supports
–
–
–
–
–
Business analysis queries
Data visualization
Trend analysis
Scenario analysis
User defined queries
OLAP 101
• Drill Down
– Move from summary to detail
• Roll Up
– Move from detail to summary
• Slice and Dice
– Look at a specific interest of the business
OLAP 101
• Pivot and Rotate
– Looking at data from varying perspectives
• Drill Through
– Move to a near transaction level of detail
OLAP 101
• The flavors of OLAP
– Multidimensional On-Line Analytical
Processing (MOLAP)
– Relational On-Line Analytical Processing
(ROLAP)
– Hybrid On-Line Analytical Processing
(HOLAP)
OLAP 101
• MOLAP
–
–
–
–
Produces a hypercube
Pre-aggregated and pre-calculated
Rapid response times
Limited in the amount of data that can be
managed
OLAP 101
• ROLAP
–
–
–
–
Data remains in a relational format
Some degree of aggregation
Slower response times
Scales to large amounts of data
OLAP 101
• HOLAP
– Can manage data both as ROLAP and MOLAP
– Currently evolving
– MOLAP vendors are finding it easier to move
into the HOLAP market space
Data Mining 101
As defined by the Gartner Group in 1995,
data mining is:
“…the process of discovering meaningful new
correlations, patterns, and trends by sifting
through large amounts of data stored in a
repository, using pattern recognition
technologies and statistical and mathematical
techniques.”
Data Mining 101
• Data mining requires an analyst who is
familiar with the domain to appropriately
model scenarios.
• Data mining assists analysts in uncovering
nontrivial data relationships.
• Analysis must be conducted to determine
the meanings of these newly identified
relationships.
Why Use a Data Warehouse ?
• Data warehousing is a must for anyone who
uses multiple data sources to make
decisions and understand business (trends,
forecasting).
• Those who do not move to warehousing
will not be capable of responding to
problems and business conditions, thus
falling behind the competition.
Why Use a Data Warehouse ?
• For organizations wanting to minimize costs
and maximize productivity, warehousing is
a must.
• Individuals who spend time gathering data
instead of analyzing data require the
assistance of a warehouse.
• Organizations that collect data but have
difficulty determining meanings and
impacts need a data warehouse.
Making the Warehouse a Reality
•
•
•
•
Think big but work small.
Match technology to requirements.
Build for the future (scalability).
Work closely with the users.
– Requirements
– Rapid Application Development (RAD)
– Periodic releases to the user community
Real World Success Stories
• Radio Shack
– Sales and stocking analysis
– Marketing (regionalized mailings)
• Wal-Mart
– Sales and stock analysis
– Trend analysis
– Vendor analysis
Real World Success Stories
• Naval Surface Warfare Center (NSWC)
– Procurement
– Supply
– Workload
• Harris Semiconductor
– Yield
– Product
– Personnel productivity
Real World Success Stories
• Defense Logistics Agency (DLA)/ManTech
–
–
–
–
Trend analysis
Problem identification
Procurement support
Enterprise data analysis
A Few Observations About Data
Warehouses
• Industry and our experience indicate that:
– Warehouses that succeed average an ROI of 400% with the top end
being as much as 600% in the first year.
– The incremental approach is most successful (build the warehouse
a functional area at a time).
– The average time to gather requirements, perform a design, and
deploy a warehouse increment is six months.
– New tools may be required that differ from the transaction
environment.
• Software oriented toward intelligent analysis and query of the data
warehouse
• Hardware oriented to support the massive storage requirements and
analytical queries
Keys to Success
• Do you understand why you are building
the warehouse?
• Have you identified both technical and
business professionals that you will need to
build the warehouse?
• Do you have a strong management sponsor?
• Are you managing the expectations of the
users?
Careers in Data Warehousing
•
•
•
•
•
•
System Administration
DW Architect
Data Architect
DW Manager
DW Administrator
Decision Support
Analysts
• DBA
• Application Developer
• Data Cleansing/
Transformation
Analyst
• Business Analyst
• Management
A Real Architecture
Data Movement Using
MQSeries
DISA
SAMM S
WSDB
WSDB
DSD
DSD
DISC
HP V2500
Other
C&T
HP V2500
WSDB
Other
WSDB
WSDB
DSD
DISC Users
Other
DSD
DSCC
HP V2500
Other
DSCR
HP V2500
DSD
Medical
HP V2500
C & T Users
T1
T1
Other
T1
DSCR Users
DSCC Users
T1
M edical Users
DLSC
T1
MPP or SMP Cluster
FLIS
NIPRNET
Data
Warehouse
Other Data Source
All Key DLSC Analytical
Data
Future
Architecture
Internet
Other Users