Transcript Document

SvalTech
Getting Started Using
Database Archiving
Toronto DAMA Chapter Meeting
16 September, 2009
Jack E. Olson
[email protected]
www.svaltech.com
“Database Archiving: How to Keep Lots of Data for a Long Time”
Jack E. Olson, Morgan Kaufmann, 2008
Copyright SvalTech, Inc., 2009
SvalTech
Why This Presentation
•
A common position of many IT shops is
– We know we should be doing database archiving
– We know it will be valuable to us
– But we don’t know how to get started
•
Database archiving is an enterprise technology: it can be used in
many applications
•
Not all database applications are suitable for database archiving
•
Suitable applications have widely differing return-on-investment
potential
Copyright SvalTech, Inc., 2009
2
SvalTech
The Database Archiving Survey
organize survey team
application enumeration
first-cut feasibility
data-life-cycle analysis
operational analysis
risk analysis
metric gathering
evaluate implementation options
business case development
prioritization
Copyright SvalTech, Inc., 2009
3
SvalTech
The Survey Organization
Mandate
People
Inputs
Mandate
A management directive that creates the database archiving survey
task force and gives them the scope and objectives of the study.
Scope:
business units to include, organizational units (divisions,
companies, campuses)
Objectives:
find best candidates for cost reduction, fixing operational
problems, risk reduction
Copyright SvalTech, Inc., 2009
4
SvalTech
The Survey Team
Mandate
People
Inputs
Copyright SvalTech, Inc., 2009
Chair
Fulltime members
IT/enterprise architect
storage administration
records retention
Subject matter members
database architect
data management
business unit data analyst
database administration
Incidental members
legal department
IT compliance
data governance
security administration
data analyst (BI type)
5
SvalTech
Starting Materials
Mandate
People
Inputs
Copyright SvalTech, Inc., 2009
Enterprise data model
Data classification results
SLA’s
IT storage strategy
Regulations/compliance rules
Data governance mandates
6
SvalTech
Application Enumeration
Limit search to those within mandate
business unit
location
enterprise
Identify Operational Applications
classify as transactional vs. static data
include those already archiving to any extent
Identify Retired Applications still retaining data
Identify applications about to change
consolidations planned
planned or recent acquisitions
replacements/ conversions/ reengineering
identify any strategies for application retirement
Copyright SvalTech, Inc., 2009
7
SvalTech
Application Enumeration
For Applications with potential,
Capture application data model
Identify business records within the data model
Connect business records to records retention and legal categories
Identify database information: system/dbms/file/metadata
Create a Database Topology chart
Identify parallel applications within the corporation
(even if out of scope)
Identify operational replicates
Identify backup/disaster recovery stores and strategies
Identify recurring data extracts for BI, etc.
Get rough idea of db size and transaction rates
Copyright SvalTech, Inc., 2009
8
SvalTech
Database Topology Chart
create data
operational
replicate
operational
archive
offline
storage
backup
BI stores
CRM
backup
disaster
recovery
Copyright SvalTech, Inc., 2009
9
SvalTech
First-Cut Feasibility
Factors for continuing to consider,
important data
lots of data
lots of individual business records
simple data structures
relatively stable data structures (little change)
long retention requirement
long inactive period within retention requirement
low frequency access requirement in inactive period
low performance requirement in inactive period
simple access requirements in inactive period
Apply criteria after each subsequent step to further eliminate
bad candidates
Copyright SvalTech, Inc., 2009
10
SvalTech
Examples
Good
Bank deposits and withdrawals
Stock trades
Credit card transactions
Ticketmaster transactions
Medical claim data
Casualty claim data (auto, home)
Retail sales inventory transactions
Package tracking
Passenger flight data
Driver license records
Sales tax records
Property tax records
Telephone call transactions
Nuclear reactor monitoring records
Auto warrantee records
Copyright SvalTech, Inc., 2009
Not Good
Customer master data
Airplane manufacturing records
HR records
Felony records
Home sales
11
SvalTech
Data Life Cycle Analysis
Create a database archiving DLCA for each business record type
Data Retention Chart
Business Record Process Chart to determine inactive period
Business Record SLA chart by age of record
Copyright SvalTech, Inc., 2009
12
SvalTech
Data Retention Chart
The requirement to keep data for a business object for a
specified period of time. The object cannot be destroyed until
after the time for all such requirements applicable to it has past.
Business Requirements
Regulatory Requirements
The Data Retention requirement is the longest of all requirement lines.
Copyright SvalTech, Inc., 2009
13
SvalTech
Business Record Process Chart
for a single instance of a data object
Retention requirement
time
Create PO
Update PO
Create Invoice
Backorder
Create Financial Record
Update on Ship
Update on Ack
Weekly Sales Report
Quarterly Sales report
Extract for data warehouse
Extract for bus analysis
Common customer queries
Common bus queries
Ad hoc requests
Law suit e-Discovery requests
Investigation data gathering
operational
Copyright SvalTech, Inc., 2009
reference
inactive
14
SvalTech
Business Record SLA Chart by Age
for a single instance of a data object
Retention requirement
time
Query response time
Transaction volume
create/update
Security (no users)
read
operational
Copyright SvalTech, Inc., 2009
reference
inactive
15
SvalTech
Operational Analysis
Don’t assume there are no problems.
Talk to DBAs and users.
Look for trends
Look for escalating operational costs.
Get numbers.
Copyright SvalTech, Inc., 2009
16
SvalTech
Operational Analysis
•
•
•
Performance Issues
–
Not meeting response time SLA
–
Longer time to run extracts
–
Longer time to run backups
–
Longer time to run database reorganizations
–
Running reorganizations more frequently
–
More difficult to tune
Risk Issues
–
Longer estimated time to run recovery
–
Longer estimated time to run disaster recovery
Cost Issues
–
Higher annual hardware costs
–
Higher annual MIP-based software cost
–
Adding expensive DASD to support database and backups
Copyright SvalTech, Inc., 2009
17
SvalTech
Risk Analysis
•
•
•
•
Data Loss Risk
–
Isolation from internet hackers
–
Prevent ANY updates or deletes
–
Preserve data through multi-site backups and periodic pings
Data Quality Risk
–
Changing data structures and column semantics
–
Changing reference data
Unauthorized Access Risk
–
Reduced (or different) user set
–
Audit trail of access
Legal Risk
–
Preserve authenticity of data in archive
–
Reduce cost and time to produce data for discovery requests
Copyright SvalTech, Inc., 2009
18
SvalTech
Metric Gathering
Data
bytes stored per business object
new transactions created per day
bytes for backups, replicates
growth in transactions rates
any sudden expected additions
past history plus future projections
Storage Costs
cost per byte: operational
cost per byte: backup
cost per byte: archive
archive compression ratios
System Costs
mips required to process
software license fees
staff for operational
Copyright SvalTech, Inc., 2009
19
SvalTech
Metric Gathering
For retired applications concentrate on
displaced system cost
displaced software cost
displaced staff cost
NOT shared
Shared
IBM mainframe
IMS DBMS
CICS
DBA/SYSPROG
Copyright SvalTech, Inc., 2009
LINUX server
Archive software
JDBC
Archive admin
20
SvalTech
Evaluate Implementation Options
•
Software
– Vendor provided software
– Custom built solution
•
Access tools
– Original application
– Generic report generation/ query tools
– Custom built
•
Storage for archive
– Storage subsystem
– Hosted storage
Copyright SvalTech, Inc., 2009
21
SvalTech
Architecture of Database Archiving
Operational System
Application program
Archive Extractor
Archive Administrator
Archive Designer
Archive Data Manager
Archive Access Manager
OP DB
Archive extractor
Archive Server
archive
catalog
archive
storage
Copyright SvalTech, Inc., 2009
22
SvalTech
Estimate Implementation
Time and Cost
•
Archiving systems required
– Servers
– Storage systems (hosted storage?)
– Licensed software
•
Application Design
•
Implementation
•
Test
•
Deployment
•
Ongoing operation and administration
Copyright SvalTech, Inc., 2009
23
SvalTech
Business Case Development
– Lower IT costs
– Improved operational efficiency
– Risk reduction
Copyright SvalTech, Inc., 2009
24
SvalTech
Lower IT Costs
•
Systems
– Reduce size/cost of operational systems
– Put off or eliminate need for system upgrades
•
Software
– Eliminate or reduce cost of expensive system software
• DBMS
• Transaction system
– Eliminate or reduce cost of application software
•
Storage costs
– Switch to lower cost storage
– Impact on backups/ disaster recovery stores
– Reduction in byte count stored
•
Staff
– Eliminate or reduce legacy system staff
Copyright SvalTech, Inc., 2009
25
SvalTech
Chart it
All data in
operational db
Inactive data in
archive db
most expensive system
most expensive storage
most expensive software
least expensive system
least expensive storage
least expensive software
In a typical op db
60-80% of data
is inactive
Size Today
This percentage
is growing
operational
Copyright SvalTech, Inc., 2009
operational archive
26
SvalTech
Lower IT Costs
• First year impact
• Time to recover project costs
• Chart cost savings over time
– Plot data growth over time for operational
– Plot data growth over time of archive
Copyright SvalTech, Inc., 2009
27
SvalTech
Operational Improvements
•
Itemize improvements expected
– Performance of operations
– Reduction of utility times
– Reduction of recovery times
– Reduction of disaster recovery times
– Reduction of DBA workload
• Provide cost savings where appropriate
Copyright SvalTech, Inc., 2009
28
SvalTech
Risk Reduction
• Itemize improvements expected
– Less risk of failing e-Discovery request
– Enhanced data quality of older data
– Less exposure to loss of data authenticity
– Better access control
– Better compliance
– Better data governance
– Less dependence on legacy systems
• Provide cost savings where appropriate
Copyright SvalTech, Inc., 2009
29
SvalTech
Prioritization
– Determine Prioritization Criteria
• Cost is most common primary factor
– First archiving project may have other goals
• Lower risk of failure
• Faster implementation
• Faster return on investment
• Usually a retired application project
– Risk may over-ride other factors
• Preserve data authenticity
Copyright SvalTech, Inc., 2009
30
SvalTech
Final Thought
• Always do a survey to find the best applications to start with
• Always do a survey to identify those that make sense to proceed
with versus those that do not: don’t waste time on apps that are
too hard to implement or that will have little value.
• A good database archive application can save millions of dollars
per year, increase performance of operational systems and reduce
risk all at the same time. The trick is identifying them and proving it.
• Repeat the Database Archiving Survey from time to time in the future.
Copyright SvalTech, Inc., 2009
31