and “When” before delving into “How to” options

Download Report

Transcript and “When” before delving into “How to” options

Exploring Database Archival Strategies
With emphasis on “Why” and “When”
before delving into “How to” options
By: Ben Aminnia
President, L.A. SQL Server Professionals Group www.sql.la
Database Architect, Pointer Corporation www.pointercorp.com
Objectives
What “is” and what “is not” covered in this
presentation
The first question we should ask ourselves is
not “How to archive” but rather …
Why do we need to archive and what happens
if we don’t?
And then …
2
Objectives
The “How to” part will encompass multiple
questions …
How do we plan and design the archive from
an architect’s perspective?
How do we look for alternative approaches?
How do we choose among those alternative
approaches?
And more importantly …
Remember that we’re not alone in the decision
process.
3
Management Concerns
Before the archive process starts,
management must approve the approach.
How do we defend and justify our selected
approach?


When talking to IT Management
When talking to Non-IT Management; CEO’s; CFO’s
How do we put ourselves in their shoes?
How do we build a decision matrix to compare
various alternative approaches?
What criteria columns should we put on the
decision matrix?
What’s the cost / benefit summary of different 4
alternatives on our decision matrix?
Management Concerns
How do we measure cost / benefit?



One-time / Initial Costs
Recurring / Annual Costs
What about measuring benefits? Have you thought
about this?
Do we have a Service Level Agreement
(SLA)?
5
Management Concerns
But the challenge of seeking management
approval may go well beyond that;
They may also ask: How did we get here?
Why didn’t we do this and that earlier?
Why can’t we save nothing and just recreate?
Trying to solve one problem while creating
other problems …
 L.A. Traffic is Bad
 Make all highways one-way to Big Bear Lake
 That will solve L.A.’s traffic problem
 Let the mayor of Big Bear Lake worry about
their traffic problem; that’s not my problem!6
Why Do We Archive?
Increasing Cost of Storage / Hardware
Performance Degradation / Response
Time
Regulatory / Government Requirements
Application Requirements
Must show current year only
 Data transfer to disconnected users
 Part of a bigger picture, beyond the scope of
our role in the project

It’s part of SLA!
7
When Do We Archive?
Once a year

During spring cleaning season
When something breaks unexpectedly
and then everyone wakes up and says
“Oops! We forgot to archive.”
When we have budget
When we have nothing else to do
When we are told to get it done by
Monday
8
When Do We Archive?
When DB size approaches a predefined
threshold …
 1 GB
 10 GB
 100 GB
 1 TB
The important point is to understand the issue
and to have a strategy for addressing it.
9
Time for a Quick De-Tour!
10
Using ASP.NET, XSLT, and XML to Take
SQL Server to a New Height
SQL Server as a Document Repository
Part 4 - The Database Archival Challenge
By: Ben Aminnia
President, L.A. SQL Server Professionals Group www.sql.la
Database Architect, Pointer Corporation www.pointercorp.com
Agenda
Part 1 Review Summary: Background and
Overview of the VIP System Architecture
Part 2 Review Summary: Generating Reports &
Graphs with SSRS and MS-Chart
Part 3 Review Summary: The Road Ahead –
Using SQL Server as a Document Repository
Part 4: The Database Archival Challenge!
Questions and Answers
12
Architectural Notes and
Challenge for the DBA



Each record is about 100 KB large;
So it takes about ten thousand records to reach
one GB in DB size;
There’s no physical deletion; deleted records
are only marked for deletion (with
[isdeleted]=1);
13
Some Fundamental Questions




There are many questions on How, What, and
Why to deal with database archival process.
Who’s on 1st?
Should we plan archival when we’re running out
of space? … or performance has gone down? …
or some other company policy mandates it?
OR … as the famous database architect, Julie
Andrews, sings in The Sound of Music …
LET’S START AT THE VERY BEGINNING!
14
The First Question:
HOW?, WHAT?, or WHY?
There are six different possible orders to ask
these questions …
I think the answer is:


•
WHY?
•
WHAT?
•
HOW?
15
Why?



Increasing cost of storage / hardware
Performance Degradation / Response Time
Application requirements (e.g. must show
current year only)
16
What and Where to?
The whole record is moved to another location
and deleted from the main location
Only part of the record is moved
Destination could be …






To another DB
To the file system
No longer online
17
How?



How to archive?
How to retrieve the archived record?
What are the possible alternative from an
architectural perspective?
18
Four Alternative Ways …




Method 1: Store in archived
location (e.g. on the network
file system) from the beginning.
There’s nothing to archive periodically or at a
later time.
This used to be the most common way for
document archival, before the XML technology
which we started to use in the VIP Letters
system.
I stored over 50,000 documents from one of
my applications this way.
19
Four Alternative Ways …






Method 2:
Periodic archive of the
whole record to a different database.
Move last year’s data to a different database
with identical format.
Delete the entire archived record from the main
database.
Main database remains small and portable.
Helps with response time.
Also helps with portability (e.g. when laptop
users need to have a small version of database
on their local drive, while disconnected from
corporate network or the internet).
20
Four Alternative Ways …








Method 3: Partial archive of the
old records …
This is the case of our VIP Letters architecture.
Each record is about 100 KB.
10,000 records are almost 1.0 GB.
Most of it is the XML column data which holds
the saved letter.
The archive process will move the XML part of it
to the network share on the file system …
Keeping the other data columns in the main DB.
You then set the “IsArchived” column to 1.
21
Four Alternative Ways …



Method 4: No longer online …
Very common practice when a government or
regulatory agency mandates only x number of
years to keep records online.
The archived records are then scanned and
stored offline (on a tape or in paper form).
22
Back to Our Discussion
Here
23
In Summary
Archival is done in Phase 4
And that’s when it should be done
Nobody said archival should be done in
Phase 1
But we should have it in sight – on the
horizon – from the beginning
Archival planning / implementation
should not come as a surprise!
Example: DB and Website Size Tracking
(before the archival time)
24
Four Methods to Archive
Again, from my other presentation, we
looked into four methods to archive:
Method 1: Store in archived location (e.g.
on the network file system) from the
beginning.
 Method 2: Periodic archive of the whole
record to a different database.
 Method 3: Partial archive of the old records
 Method 4: No longer online

25
A Closer Look at Method 3
Method 3: Partial archive of the old records
Save a 2nd copy of the document on the file
system and then delete it from the XML
column of the database
What tracking columns do we add to the main
Document Archive table?
How / when to copy document(s) to the file
system …
 During the original creation
 Later; one-by-one; on demand
 Later; in batches (e.g. older than 1/1/2005)
How to retrieve it back from the file system 26
A Closer Look at Method 4
Method 4: No longer online
That is, neither in the XML column nor on the
file system
How do we recreate it later when needed?
Scan the paper copy?
I don’t think so!
Regenerate using original letter’s parameter
values which are still in the database?
What if the original template (XSLT) has
changed and recreate doesn’t look like the
original anymore or it fails?
27
Beyond Four Methods
Method 5: Totally move from primary database
to anther (non-DB) medium (e.g. network
share / tape)
More common in legacy system; Not one of my
options.
Method 6: Don’t move anything; just set the
archived flag and use a VIEW to filter out
archived records.
Makes sense if the objective is “visibility” and
not “space” or “response time” or “data
transmission between network and local for
disconnected mode”
28
Beyond Four Methods
Method 7: Use “The Cloud”
A viable alternative for space limitations in a
hosted environment or cost considerations
It’s really NOT an archival alternative.




Gave it a shot
Asked for a compatibility test
Nice presentation, but not compatible!
Analogy with a personal power-generator vs.
electrical outlet connected to DWP …
29
The “Cloud” - Before
My Personal
Power Generator
30
The “Cloud” - After
X
31
The “Cloud” - After
X
What if I need a 150 V generator?
32
Final Thoughts
Before Meeting with Management
Be Prepared to Answer Questions
Which functions will NOT work on archived
records? (e.g. Full-Text Search)
Have a Decision Matrix, showing all options
with pros and cons of each
User Interface to Retrieve an Archived Record



From the main application
From a secondary application solely for the purpose
of archive retrieval
By sending a request to a service application or email
to a designated contact
Turnaround time and other pros and cons for
each of the above approaches
33
The Solution Matrix
Solution Alternatives
Problems Solved
Cost of
Storage
Performance / Application
Response Time Requirements
1. Store in archive location from beginning ?
?
?
2. Move from primary to history DB
√
√
√
3. Partially move; e.g. keep the record but
empty the XML column
4. Just delete the XML column without
moving anything; rebuild (based on most
recent template) if necessary
√
√
X
√
√
X
5. Move from primary table to anther
(non-DB) medium (e.g. network / tape)
6. Don’t move anything; use filters
√
√
√
X
X
√
7. Use “the cloud”
√
X
X
34
The Solution Matrix
Final Thoughts
Ideally, each solution alternative on the matrix
needs the following:
 Development Costs
 Infrastructure Costs
 Maintenance Costs
 Which problem(s) are being addressed
 Which problem(s) are NOT being addressed
 What new problems might be introduced
Eventually, as the IT architects, we will
be responsible for the outcome!
35
Questions and Answers
36
Contact Information
•
Emails:


•
[email protected]
[email protected]
Websites:

www.sql.la

www.pointercorp.com

www.vipletters.com
37
Thank You!
38