An Overview of Database Archiving

Download Report

Transcript An Overview of Database Archiving

SvalTech
An Overview of
Database Archiving
Toronto DAMA Chapter Meeting
16 September, 2009
Jack E. Olson
[email protected]
www.svaltech.com
“Database Archiving: How to Keep Lots of Data for a Long Time”
Jack E. Olson, Morgan Kaufmann, 2008
Copyright SvalTech, Inc., 2009
SvalTech
Topics
•
•
•
•
•
Database Archiving Definitions
Database Archiving Application Profiles
Elements of a Successful Implementation
Solution Comparisons
Business Case Basics
Copyright SvalTech, Inc., 2009
2
SvalTech
Database Archiving Definitions
Copyright SvalTech, Inc., 2009
3
SvalTech
Definition
The process of removing selected data items from
operational databases that are not expected to be referenced
again and storing them in an archive database where
they can be retrieved if needed.
Physical Documents
application forms
mortgage papers
prescriptions
File Archiving
structured files
source code
reports
Copyright SvalTech, Inc., 2009
Document Archiving Multi-media files
word
pictures
pdf
sound
excel
telemetry
XML
Email Archiving
outlook
lotus notes
Database Archiving
DB2
IMS
ORACLE
SAP
PEOPLESOFT
4
SvalTech
Data Sources
Data created and maintained by either custom applications or
packaged applications that store data in structured database
management systems or structured records in file systems.
transaction data
reference data
DB2
SAP
IMS
Oracle Financials
ADABAS
Siebel
IDMS
PeopleSoft
ORACLE
SQL SERVER
VSAM
Copyright SvalTech, Inc., 2009
5
SvalTech
Data Domain – Business Records
The data captured and maintained for a single business
event or to describe a single real world object.
Databases are collections of Business Records.
Database Archiving is Records Retention.
customer
employee
stock trade
purchase order
deposit
loan payment
Copyright SvalTech, Inc., 2009
6
SvalTech
Drivers
Longer Data Retention requirements
Expanded Business
overloaded
operational
databases
Operational problems
Mergers and Acquisitions
Data Governance
e-Records Retention
e-Discovery Readiness concerns
Application Changes
Copyright SvalTech, Inc., 2009
7
SvalTech
Data Retention
The requirement to keep data for a business object for a
specified period of time. The object cannot be destroyed until
after the time for all such requirements applicable to it has past.
Business Requirements
Regulatory Requirements
The Data Retention requirement is the longest of all requirement lines.
Copyright SvalTech, Inc., 2009
8
SvalTech
Data Retention
•
Retention requirements vary by business object type
•
Retention requirements from regulations are exceeding business requirements
•
Retention requirements will vary by country
•
Retention requirements imply the obligation to maintain the authenticity of the data
throughout the retention period
•
Retention requirements imply the requirement to faithfully render the data on demand in a
common business form understandable to the requestor
•
The most important business objects tend to have the longest retention periods
•
The data with the longest retention periods tends to be accumulate the largest number of
instances
•
Retention requirements often exceed 10 years.
more years for some applications
Copyright SvalTech, Inc., 2009
Requirements exist for 25, 50, 70 and
9
SvalTech
Data Time Lines
for a single instance of a business record
create
event
discard
event
operational
phase
reference
phase
inactive
phase
operational phase
can be updated, can be deleted, may participate in
processes that create or update other data
reference phase
used for business reporting, extracted into business
intelligence or analytic databases, anticipated queries
inactive phase
no expectation of being used again, no known business
value, being retained solely for the purpose of satisfying
retention requirements. Must be available on request in
the rare event a need arises.
Copyright SvalTech, Inc., 2009
10
SvalTech
Data Time Lines
for a single instance of a business record
Retention requirement
Create PO
Update PO
Create Invoice
Backorder
Create Financial Record
Update on Ship
Update on Ack
Weekly Sales Report
Quarterly Sales report
Extract for data warehouse
Extract for bus analysis
Common customer queries
Common bus queries
Ad hoc requests
Law suit e-Discovery requests
Investigation data gathering
operational
Copyright SvalTech, Inc., 2009
reference
inactive
11
SvalTech
Data Time Lines
•
Some objects exit the operational phase almost immediately (financial records)
•
Some objects never exit the operational phase (customer name and address)
•
Most transaction data has an operational phase of less than 10% of the retention
requirement and a reference phase of less than 20% of the retention requirement
•
Inactive data generally does not require access to application programs: only access to ad
hoc search and extract tools
Copyright SvalTech, Inc., 2009
12
SvalTech
Application Segments
An application segment is a set of business objects generated from a single version
of an application where all business records in the segment have data consistent with
a single metadata definition.
A metadata break is a point in the life of the operational database where a change in metadata
is implemented that changes the structure of the data or the manner in which data is encoded.
•
An application will have many segments over time
•
Minor changes in metadata can sometimes be implemented without forcing a segment change
•
Major metadata changes will always generate a segment change where data created in the
previous segment cannot be recast to the new metadata definition without some compromise in
the data
•
Application segments can be generated in parallel with one operational implementation using one
version of the application at the same time that another operational instance is using a different
version of the application
Copyright SvalTech, Inc., 2009
13
SvalTech
Application Segments
case 1
Application: customer stock transactions
S1
OS1
time
Source 1 = Trades – All Stock Trades
case 2
Application: customer stock transactions
OS1
S1
OS2
S2
time
Source 1 = Stock Trades – North American Division
Source 2 = Stock Trades – Western Europe Division
= major metadata break
Copyright SvalTech, Inc., 2009
14
SvalTech
Application Segments
case 3
Application: customer stock transactions
OS1
S1
OS2
S2
OS3
S3
OS4
S2
time
Source 1 = Stock Trades – North American Division – application X
Source 2 = Stock Trades – Western Europe Division – application Y
Source 3 = acquisition of Trader Joe: merged with Source 1 on 7/15/2009
Source 4 = acquisition of Trader Pete: merged with Source 1 on 8/15/2009
= major metadata break
Copyright SvalTech, Inc., 2009
15
SvalTech
Application Segments
• A well designed database archive preserves
application segments
– Data is always kept in segment format
– Metadata is preserved at the segment level
– The archive administrative catalog shows
• Segments
– Segment version number
– Time period covered
– System generated from
• Time order of consecutive segment strings
• Parallel segment strings for the same application
Copyright SvalTech, Inc., 2009
16
SvalTech
Database Archiving
Application Profiles
Copyright SvalTech, Inc., 2009
17
SvalTech
Overloaded Operational Database
• Transaction data
• Lots of data
– Hundreds of millions of rows
– High daily transaction rate
•
•
•
•
24/7 operational availability requirement
Long retention period (7 years or more)
Short useful active life (less than 2 years)
Low access requirements during the inactive period
– Very low access frequency
– Response time not critical
– Access requirements are simple, easily satisfied with ad hoc tools
Copyright SvalTech, Inc., 2009
18
SvalTech
If You Don’t Archive
•
Continue to keep
all data in operational
Database.
•
•
Inactive Data will impact operational performance
– Harder to tune
– Scans take longer
Utility functions will take longer to execute
– Backups
– Database reorganizations
Recovery Operations take longer
–
–
•
System Costs will Escalate
–
–
–
•
Copyright SvalTech, Inc., 2009
Outage recoveries
Disaster recoveries
Need more expensive online storage
Need system upgrades
Pay more for application and DBMS software
Older data will become less reliable
19
SvalTech
Retired Application
• Merger of companies results in an operational
application being duplicated
• Data Structures are not compatible
– One keeps data elements not in other
– One encodes data elements differently
– One designed for different OS/DBMS than other
• Decision is made to use one system and
abandon the other one
• Meets all requirements of an operational
application
Copyright SvalTech, Inc., 2009
20
SvalTech
If You Don’t Archive
•
Must retain old application environment to
access data
–
–
–
Old System
Old Application Program
Old DBMS
– Must keep knowledgeable staff to access
• Application experts
• System experts
• DBA function
•
Pay the high cost of the old
application environment
and staff until last record
reaches end of retention
period.
$$$$$$$$$$$$$$$$$$$$$$$$
•
Higher cost and time of
conversion
–
–
–
Or, Must merge data into active application
•
Resulting database is huge
–
–
–
–
Copyright SvalTech, Inc., 2009
Data conversion problems
Data loss
Resolution of data quality issues
Operational problems
Lengthy Utility runs
Lengthy Recovery periods
Escalating system costs
21
SvalTech
Application Renovation Project
• Application is undergoing major change
–
–
–
–
–
Replaced with packaged application
Legacy modernization
Legacy termination
Rewritten to be web-centric
Need to satisfy new requirements
• Old data structures are out of date
– Legacy DBMS
– Legacy file system
• Data meets all other requirements for archiving
operational application
Copyright SvalTech, Inc., 2009
22
SvalTech
•
If You Don’t Archive
Must convert all data in one
system to other system
•
More expensive and complex
design phase
–
–
•
Higher and longer conversion
period
–
–
–
•
Copyright SvalTech, Inc., 2009
Must accommodate old data
in new design
May compromise new design
Data conversion problems
Data loss
Resolution of data quality
issues
Resulting data is less reliable
23
SvalTech
Elements of a Successful
Implementation
Copyright SvalTech, Inc., 2009
24
SvalTech
Archive Staff
•
Database Archive Specialist
–
–
–
–
•
Database Archive Administrator
–
–
•
Received education on database archive design and implementation
Knows tools available
Experienced
Full time job
Received education on database archiving administration
Full time job
Supporting Roles
–
–
–
–
–
–
–
Storage Administrators
Database Administrators
Data Stewards
Security Administrators
Compliance staff
IT management
Business Unit Management
Copyright SvalTech, Inc., 2009
25
SvalTech
Architecture of Database Archiving
Operational System
Application program
Archive Extractor
Archive Administrator
Archive Designer
Archive Data Manager
Archive Access Manager
OP DB
Archive extractor
Archive Server
archive
catalog
archive
storage
Copyright SvalTech, Inc., 2009
26
SvalTech
Archive Designer
•
Metadata
–
–
–
–
•
Data
–
–
–
–
–
•
Capture current metadata
Validate it
Enhance it
Design archive storage format
Define business records to be archived
Define source of data
Define data structures within operational system
Define reference data needed to include with it
Define archive format of data
Policies
–
–
–
–
Define extract policy (when a record becomes inactive)
Define operational disposal policy (when to remove from operational database)
Define storage policy (how to protect data in archive)
Define discard policy (when to remove from archive)
Copyright SvalTech, Inc., 2009
27
SvalTech
Archive Extractors
•
Extractor process
–
–
–
–
–
–
–
–
–
•
Verify consistency with design metadata
Extract data as defined in designer
Mark or delete from operational database as defined in designer
Pass data to archive data manager
Keep audit records on everything done
Do not impact operational performance
Support interruptions with transaction level recovery
Support restart
Finish scans within acceptable time periods
Scheduling
– Establish periodic executions
– Find non-disruptive periods
– Be consistent
Copyright SvalTech, Inc., 2009
28
Archive Extractors
SvalTech
Physical vs. Application Extractors
Archive Extractor
Operational System
Application program
OP DB
Archive extractor
Physical Extractor
Application Extractor
Gets/deletes data directly from the database
tables, rows, columns
Gets/deletes data from an application API
virtual tables, rows, columns
application program
Copyright SvalTech, Inc., 2009
29
SvalTech
Archive Data Manager
•
Put data away
–
–
–
–
–
•
Execute Storage policies
–
–
–
–
–
•
Return to accessing programs
Fetch data on request
–
–
•
Encryption/ signatures
Backup copies created and stored
Geographic dispersion of backups
Register archive files with archive catalog
Enter audit trail information
Fetch metadata on request
–
•
Receive data from extractors
Format into archive segment files
Determine metadata version affinity
Format and store metadata files if new
Build or update segment indexes both internal and external
Scan archive segments
Search through indexes
Execute Archive Discard Process
–
–
–
Periodic scheduling
Delete qualifying business records
Update archive catalog
Copyright SvalTech, Inc., 2009
30
SvalTech
Archive Access
•
Query Capability
–
–
–
–
•
Determine applicability based on archive segment versions of metadata
SQL based in best, if possible
Employ external indexes to determine which archive segments to look into
Employ internal indexes to avoid reading all of an archive segment
Support standard access tools
–
–
–
Report generation (such as Crystal Reports)
Generic query tools
JDBC interface
•
Support metadata version browsing
•
Support generation of load files based on query results
•
Support generation of load files based on original data source based on query results
Copyright SvalTech, Inc., 2009
31
SvalTech
Archive Administration
•
Manage Archive Catalog
–
–
–
•
Manage Archive Storage Systems
–
–
•
Application archive designs
Audit trails
Results logs
Ensure periodic readability checks
Maintain access audit trails
Manage Archive Access
–
–
–
Authorizations for users
Authorizations for specific events
• Unloads
Ensure audit records are created for all access
•
Manage e-Discovery requests
•
Ensure Extract and Discard processes are run when they are supposed to
•
Manage Metadata Change Process
Copyright SvalTech, Inc., 2009
32
SvalTech
Solution Comparisons
Copyright SvalTech, Inc., 2009
33
SvalTech
How Archive Data is Stored
Parallel databases
Partitions of operational db
typically homegrown
solutions
Database LOAD Files
Saved image copies
Reformatted archive segments
stored as files
load files
typically vendor
XML files
solutions
special files
Copyright SvalTech, Inc., 2009
34
SvalTech
Storage Comparisons
DB Solutions
parallel
partitioned
db arrays
Don’t get $$$$ savings
Requires database administration
Problems handling metadata changes
Backup Solutions
image copies
unload files
Non-DB Special files
XML
load files plus
proprietary
Copyright SvalTech, Inc., 2009
Requires restaging data to access
Not searchable in archive
Problems handling metadata changes
Indexed
Direct access via SQL
Separated by archive segments
Metadata resolution across archive segments
Can exploit storage subsystem capabilities
Can use hosted storage
35
SvalTech
Data Structure Comparisons
Things to Look for
Is metadata maintained in archive
Is metadata validated
Is metadata enhanced
Is data restructured to achieve source independence
from application programs
from DBMS type
from source OS/ hardware
Is reference information captured in archive
Is data maintained in original form in archive forever
Can user see data form prior to conversions
Copyright SvalTech, Inc., 2009
36
SvalTech
Data Access in the Archive
Things to Look for
Can requests be satisfied directly from the archive
Can common generic tools be used
JDBC
Report generators
Can data be unloaded in forms for re-platforming
Can data be accessed efficiently
Is it indexed
Is representations consistent
Are metadata differences accounted for
Copyright SvalTech, Inc., 2009
37
SvalTech
Administration of the Archive
Things to Look for
Is there a full time administrator
Is there an archive catalog database
what is in the archive
where is it stored
Is security maintained
different from operational
Are actions and events logged
Copyright SvalTech, Inc., 2009
38
SvalTech
A Myth
Homegrown Solutions are good enough.
Truth:
They do solve the problem of getting inactive data out of operational databases
However,
They do not realize maximum cost savings
They generally do not realize any cost savings
They generally cannot be directly accessed
They often require original application environment
They are never indexed
They often compromise data integrity across metadata changes
They often offer less protection from data loss
Copyright SvalTech, Inc., 2009
39
SvalTech
A Myth
Homegrown Solutions are cheaper and faster to implement.
Truth:
A good vendor solution will guide you through the process and get done quickly
Managing the archive is easier and cheaper than managing databases
Copyright SvalTech, Inc., 2009
40
SvalTech
Business Case Basics
Copyright SvalTech, Inc., 2009
41
SvalTech
An Assertion
To Get the Benefits from Database Archiving
improved operational efficiency
better data governance
lower risks
It does not need to cost a penny.
If done properly, database archiving can realize cost benefits
larger than the cost of implementation and maintaining
the archive. In most cases the savings can justify database
archiving by itself.
Copyright SvalTech, Inc., 2009
42
SvalTech
Reason for Archiving
All data in
operational db
Inactive data in
archive db
most expensive system
most expensive storage
most expensive software
least expensive system
least expensive storage
least expensive software
In a typical op db
60-80% of data
is inactive
Size Today
This percentage
is growing
Operational
Copyright SvalTech, Inc., 2009
operational
archive
43
SvalTech
Cost Saving Elements
Look for and compute difference in storage costs
front-line vs archive storage
byte counts differences between operational and archive
Look for and compute difference in system costs
operational vs archive systems
are operational system upgrades avoided
are software upgrades avoided
can systems be eliminated for application
can software be eliminated for application
Look for savings on people costs
can people be eliminated or redirected for retired applications
Potential savings on changes/ application renovations
simplification of design
elimination of data conversions
Copyright SvalTech, Inc., 2009
44
SvalTech
Operational Efficiency Impacts
Will operational performance be enhanced with less data
Will utility time periods be reduced (backup, reorganization)
fewer occurrences needed
less data to process each time
Will recovery times be reduced and what is that worth
interruption recoveries
disaster recoveries
Will implementation of data structure changes be improved
avoided
reduced amount of data to unload/modify/reload
Copyright SvalTech, Inc., 2009
45
SvalTech
Risk Factors
Will the saved data have better authenticity
not changed in archive
shielded from updates or damage
traceable back to original form
Will e-Discovery benefit from archiving
can locate and process data outside of operational environment
can easily create legal-hold archive units
Will exposure of data reduced
fewer authorized users against the archive
complete audit trails of all access
Copyright SvalTech, Inc., 2009
46
SvalTech
Business Case Summary
• Database Archiving solutions generally provide for lower cost software,
can use lower cost storage more efficiently, and run on smaller machines.
• Each business case is different
Many factors can be used in building business case
Seen an application justified on storage costs alone
Seen an application justified on disaster recovery time alone
Seen an application justified on better data security alone
• Each organization will have many potential applications
• Having a database archiving practice can create synergies across many
applications thus adding more value
Copyright SvalTech, Inc., 2009
47
SvalTech
Final Thoughts
• Database Archiving is coming
• Database Archiving is good
• Reduces cost
• Improves operational efficiency
• Reduces Risk
• Need a complete solution to be effective
• Need professional staff
• Educated
• Fulltime
Copyright SvalTech, Inc., 2009
48