IT 21003 Database Administration
Download
Report
Transcript IT 21003 Database Administration
IT 21003 Database Administration
Section 08
Protecting the Database from Failure
Backup and Recovery
Questions to ask:
Who has the responsibility?
Who does it?
Who plans the procedures?
Who takes the precautions?
Who makes it happen?
Possibilities:
End Users
Operators
Power Users
Application or Systems Programmers
Systems Managers
Tape Librarians
DBA
Protecting the Database from Failure
Backup and Recovery: DBA Activities
The DBA has the responsibility for planning
backup and recovery procedures/strategies to cope
with hardware and software failure
Disasters cause high-stress situtations
The plan must be simple, preferably automated
Should minimize human intervention
Can use the Recovery Manager to help automate these
procedures
The Recovery Plans must be well documented
The recovery procedures must:
Bring the database back in a consistent state at the time of the
crash
Recover the database in a reasonable time period so that
downtime is minimized
Protecting the Database from Failure
Backup and Recovery Requirements
Recovery plans should be discussed in the early
stages of database development
Some key issues:
Is the database volatile?
Can the users afford to lose a day’s work?
Can the database be taken down on a regular basis?
It is important to establish a Service-Level
Agreement (SLA) with the user departments
Backup and recovery strategies are much simpler if
users understand the process
7x24 databases require complex backup and recovery
routines and carefully planned downtime for maintenance
Protecting the Database from Failure
Backup and Recovery Techniques
Dumping and Logging
Most sites cannot allow the lose of any work
File dumps of the database are taken at set intervals, and logs
of the changes between the dumps are produced
These files can then be used to perform full recovery of a
database
If the database (or parts of it) are lost
Restore from the latest dump
Roll Forward in time by applying the logs of all the
changes
All these facilities are available in the Oracle Recoverer
Protecting the Database from Failure
Possible Failures
User Errors – hard to plan for
Transaction Errors
User Process Failures
Network Failures
Instance Failures
CPU (Host) Failures
Media Failures
Protecting the Database from Failure
User Errors
Typical user errors
An unwanted change is committed to the database, but the
mistake is not realized until some time later
Accidental drop of a table (cannot be rolled back)
Performing wholesale deletes or updates without a WHERE clause
Prevention
Users should be trained in database and application principles
Recovery
Committed data cannot simply be backed out
Other transactions may have already used the bad data
Even if the user knows the correct value, user management should
probably be consulted before any change is made
Recovery Manager can be used to recover a tablespace to the
point before the error occured
Protecting the Database from Failure
Transaction Errors
Typical transactions errors
Lack of space when a table needs to expand
Lack of privileges on underlying objects
Deadlock with conflicting transactions
Programming errors within and application
Prevention
Cannot guarantee prevention of any of the above
Careful planning of storage needs will help
Recovery
Intervention by the DBA is not normally required
Recovery performed by Oracles automatic rollback
processes
Protecting the Database from Failure
User Process Failures
Typical causes of user process failures
Abnormal disconnection of user process
Loss of client due to user action
Addressing exception
Prevention
Not much can be done by the DBA
Recovery
Intervention by the DBA is not normally required
Aborted processes are detected by PMON which
performs a rollback of the active transaction and releases
all resources
PMON cleans out all caches and frees up memory occupied by
the failed process
Protecting the Database from Failure
Network Failures
Typical types of network failures
Network or phone connections lost
Client workstation becomes disconnected
Recovery
Intervention by the DBA is not normally required
Loss of the connection is detected automatically by
PMON
PMON forces a rollback of the client’s transaction
Protecting the Database from Failure
Instance Failures
Typical causes of instance failure
Loss of an Oracle background process
Prevention
Not much can be done
Monitor the trace files for clues to the cause
Recovery
Intervention by the DBA is not normally required except
to shutdown and restart the database
SMON process automatically applies redo log
information and rolls back all uncommitted changes on
instance startup
Protecting the Database from Failure
CPU (Host) Failure
Typical causes of CPU failure
Loss of power
Prevention
Utilize a UPS
Recovery
Intervention by the DBA is not normally required except
to have already created scripts that restart the database
on computer bootup
SMON process automatically applies redo log
information and rolls back all uncommitted changes on
instance startup
Protecting the Database from Failure
Media Failure – with no data loss
Typical failures
Loss of disk controller
Prevention
Utilize a UPS
Recovery
Intervention by the DBA is not normally required except
to have already created scripts that restart the database
on computer bootup
SMON process automatically applies redo log
information and rolls back all uncommitted changes on
instance startup
Protecting the Database from Failure
Media Failure – with data loss
Typical failures
Disk Head crash
Disk or data destruction
Fire
Natural disasters
Accidental datafile deletion
Serious Failures
Requires DBA action
Requires plans and procedures to be in place
Recovery
A strategy should be in place to handle all possible losses of data
Data loss requires both rollforward and rollback procedures
Same steps as instance recovery but probably from an earlier point
in time
Need to establish which files are lost
Backup method and type of files lost dictate the recovery method used
Protecting the Database from Failure
Media Failure – with data loss (further protections)
Mirroring
Some sites are able to use mirror disks
Offers fault tolerance against media failure
Cannot protect against user and software errors
Mirrored Systems
Expensive
Whole systems mirrored for full fault tolerance
Hot standby databases
Oracle can keep a standby database in permanent recovery mode
It is a clone of the target database
Archived redo logs generated by the target are applied to the standby
database as they are produced
RAID
Allows the use of additional disks to protect database files
Numerous configurations possible
Protecting the Database from Failure
Backup Strategies
All files need to be protected
Datafiles
Redo Log Files
Control Files
A backup strategy must be developed
A choice of techniques is available to consider
Rehearsals should be carried out periodically of both
backup and recovery procedures
Need to consider:
The volatility of the database?
Can users afford to lose any data?
Can the database have planned downtime?
How large is the database?
Protecting the Database from Failure
Archive Logging
When active online redo log file is full, it is written
out to an archive log file
Requires an additional background process
ARCH – Archiver
This is an optional process
Archiving allows the following features:
Full recovery of the database to a state of consistency
as of the time of the crash
No loss of any committed data
Important for transaction-oriented volatile databases
Oracle databases can run in either
ARCHIVELOG mode or
NOARCHIVELOG mode
Protecting the Database from Failure
Archive Logging
ARCHIVELOG mode
Allows full recovery of database
Hot backups can be performed while the database
remains available
Management overhead concerns because the archive
disk can become full
Writing to tape can be slow so archive to disk first
Highly active systems can fill redo log files faster than
the archive to tape process
This will halt the instance until the archiver catches up
Protecting the Database from Failure
Archive Logging
NOARCHIVELOG mode
Allows the fastest performance of database
Advantages
Redo log files are cyclically reused because no archive logs
are produced
No management overhead
Mass updates possible without fear of filling up the archive disk
Disadvantages
Cold, whole database backup only allowed
Damaged tablespaces stop the instance until dropped or
restored
Recovery from media failures is generally not possible
Recovery Options
Restore from a previous backup
Restore from a previous export using the import utility
Protecting the Database from Failure
Switch to Archive Logging
The database can be switched to ARCHIVELOG
mode after creation
Steps:
Shutdown Normal
Startup Mount db1
Alter Database db1 Archivelog;
Alter Database Open;
The database is placed into ARCHIVELOG mode
while in the MOUNTed state
This does not automatically enable archiving
Protecting the Database from Failure
Switch to Archive Logging
When a database is put into ARCHIVELOG mode
the redo log files are not automatically archived
Two methods to turn on automatic archiving
Use the parameter file to enable permanent automatic
archiving
Log_Archive_Start = True (False is the default)
Or while the instance is running issue:
Alter System Archive Log Start;
This starts automatic archiving but does not survive a
shutdown and startup
Remember this only activates automatic archiving if the
database is already in ARCHIVELOG mode
Stopping automatic archiving:
Alter System Archive Log Stop;
Protecting the Database from Failure
Setting a destination for Archive Log Files
The Log_Archive_Dest parameter
Log_Archive_Dest = d:\archive
This can be a single point of failure so be careful
Log_Archive_Dest_1 = d:\archive
Log_Archive_Dest_2 = e:\archive
Log_Archive_Dest_3 = e:\backup
Log_Archive_Dest_4 = f:\archive
Up to five destinations can be specified
No single point of failure
Protecting the Database from Failure
Archive Summary Information
Archive Log List;
This statement generates the following output:
Database log mode
Automatic archival
Archive destination
Oldest online log sequence
Next log sequence to archive
Current log sequence number
Protecting the Database from Failure
Redo Log Management
How many?
How big?
Where are they stored?
How are they protected/recovered?
What do they contain?
How will they be used?
Which mode of operation?
Protecting the Database from Failure
Redo Log Management
Minimum of two log files are required
Redo log files are used in a cyclic nature
Should use enough log groups to allow archiving to
take place before log files need to be reused
Default size is 500 KB
Minimum size is 50 KB
There is no maximum size
Usual size is 1 - 10 MB
All log files should be the same size
Should be positioned on separate disks
Protecting the Database from Failure
Redo Log Management
Should be duplexed (Mirrored)
Oracle is vulnerable to the loss of redo logs
Database stops and full recovery is impossible
Redo logs contain both before and after images of
changed data
Redo logs provide protection for both data and rollback segments
to allow total recovery
Redo logs are used to recover from instance and media
failures
Can be applied to restored backup copies of the database files
Database changes are written to redo logs on transaction commits
When redo log is full, a log switch and checkpoint occurs
Redo logs can be archived as they are filled
Protecting the Database from Failure
Redo Log File Switches
A checkpoint occurs at each switch and each new
log is assigned a sequence number
Redo Log Group 1
Log 1
Redo Log Group 2
Archive Redo Logs
LGWR
ARCH
Log 1
LGWR
1
Log 2
ARCH
Log 3
LGWR
2
Log 2
ARCH
Log 3
LGWR
1
3
Log 4
2
1
Protecting the Database from Failure
Redo Log Group
Number the groups 1, 2, 3 not 10, 20 , 30
Takes up less space in the control file which references the redo
log files
Use just enough groups to prevent waits for checkpoints
and archive operations
If Oracle loses a member of a group, it is recorded as
STALE and an error is recorded in the ALERT file and
LGWR trace file
System is not halted, LGWR continues to write to the other
members of the group
The archiver can archive any one member of a group
If the log file is unavailable archiver simply chooses an alternative
LGWR cannot reuse a log file currently being archived
Protecting the Database from Failure
Redo Log Group
Adding a new redo log group
Alter Database db_name Add Logfile
Group x ‘filespec’ Size n (K | M);
Adding a new redo log member
Alter Database db_name Add Logfile Member
‘filespec’ To Group n;
Dropping Redo Log Groups
Alter Database db_name Drop Logfile Group n;
At least three log file groups must be present
Must have at least two for LGWR to function
Protecting the Database from Failure
Managing and Protecting Control Files
Control Files
Relatively small binary files that are critical to the
operation of a database
Contain database definitions:
Database name
Names and locations of redo log files and datafiles
Current log sequence numbers and status
Checkpoint timestamp information
Database creation timestamp
Beginning and ending blocks of rollback segments
Information required by the Recovery Manager
All control files must be available while the database is
open
A database cannot be opened if a control file is lost
Protecting the Database from Failure
Control Files: Current Advice
Store at least two copies on separate disks
Better to have a copy on each disk
Oracle mirrors any changes to the control files on all
copies
Control files are updated at each checkpoint with a slight
impact on performance
If you have only one disk available place the control
files in different areas to avoid problems from bad sectors
Backup the control file whenever a change is
made to database or redo log files
If primary control files are lost than a full recovery is
possible from backup control files
Protecting the Database from Failure
Adding a New Control File
Shutdown the database
Copy the existing (good) control file to a file(s) on
another disk(s) using the operating system
Specify the new control file name in the
CONTROL_FILES parameter in the parameter file
Start up the instance and open the database
using the changed parameter file
Protecting the Database from Failure
Safeguarding Control Files
Perform a control file backup each time a
database change is made
Alter Database Backup Controlfile To ‘filespec’;
Trace out the control file to produce a humanreadable file
Alter Database Backup Controlfile To Trace;
This produces a file in the trace directory that can be
read, edited and used to reconfigure the control file if
needed
Protecting the Database from Failure
Loss of a Control File
If any control files are lost, the database cannot
function correctly
Shutdown the database (may require a Shutdown
Abort)
Using the OS make a copy of the good control file but
not in the same location as the problem control file
Specify the location of the new control file in the
parameter file and remove reference to the bad
control file
Start up the database and return to normal
function
Protecting the Database from Failure
Loss of All Control Files
Serious Problem
Call Oracle Support
Media recovery is likely necessary
Can restore from a backup and then use an
appropriate backup control file to recover the
database
It is sometimes possible to re-create a control file
using the CREATE CONTROLFILE statement
Protecting the Database from Failure
The Alert Log File
Written by Oracle and stored in
oraclehome/admin/oracle_sid/bdump directory
Contains information on
Database creation
Startup of each background process
Log switches
Datafile and logfile manipulation
Internal errors
Block corruption
Deadlocks
The file grows without bound
Must decide what to keep online and offline
Protecting the Database from Failure
Questions?