IT 21003 Database Administration

Download Report

Transcript IT 21003 Database Administration

IT 21003 Database Administration
Section 08
Protecting the Database from Failure
Backup and Recovery
 Questions to ask:





Who has the responsibility?
Who does it?
Who plans the procedures?
Who takes the precautions?
Who makes it happen?
 Possibilities:
 End Users
 Operators
 Power Users
 Application or Systems Programmers
 Systems Managers
Tape Librarians
 DBA
Protecting the Database from Failure
Backup and Recovery: DBA Activities
 The DBA has the responsibility for planning
backup and recovery procedures/strategies to cope
with hardware and software failure
 Disasters cause high-stress situtations
 The plan must be simple, preferably automated
 Should minimize human intervention
 Can use the Recovery Manager to help automate these
procedures
 The Recovery Plans must be well documented
 The recovery procedures must:
 Bring the database back in a consistent state at the time of the
crash
 Recover the database in a reasonable time period so that
downtime is minimized
Protecting the Database from Failure
Backup and Recovery Requirements
 Recovery plans should be discussed in the early
stages of database development
 Some key issues:
Is the database volatile?
Can the users afford to lose a day’s work?
Can the database be taken down on a regular basis?
 It is important to establish a Service-Level
Agreement (SLA) with the user departments
 Backup and recovery strategies are much simpler if
users understand the process
 7x24 databases require complex backup and recovery
routines and carefully planned downtime for maintenance
Protecting the Database from Failure
Backup and Recovery Techniques
 Dumping and Logging
 Most sites cannot allow the lose of any work
 File dumps of the database are taken at set intervals, and logs
of the changes between the dumps are produced
 These files can then be used to perform full recovery of a
database
 If the database (or parts of it) are lost
 Restore from the latest dump
 Roll Forward in time by applying the logs of all the
changes
 All these facilities are available in the Oracle Recoverer
Protecting the Database from Failure
Possible Failures
 User Errors – hard to plan for
 Transaction Errors
 User Process Failures
 Network Failures
 Instance Failures
 CPU (Host) Failures
 Media Failures
Protecting the Database from Failure
User Errors
 Typical user errors
An unwanted change is committed to the database, but the
mistake is not realized until some time later
Accidental drop of a table (cannot be rolled back)
Performing wholesale deletes or updates without a WHERE clause
 Prevention
Users should be trained in database and application principles
 Recovery
 Committed data cannot simply be backed out
 Other transactions may have already used the bad data
 Even if the user knows the correct value, user management should
probably be consulted before any change is made
 Recovery Manager can be used to recover a tablespace to the
point before the error occured
Protecting the Database from Failure
Transaction Errors
 Typical transactions errors
Lack of space when a table needs to expand
Lack of privileges on underlying objects
Deadlock with conflicting transactions
Programming errors within and application
 Prevention
 Cannot guarantee prevention of any of the above
 Careful planning of storage needs will help
 Recovery
 Intervention by the DBA is not normally required
 Recovery performed by Oracles automatic rollback
processes
Protecting the Database from Failure
User Process Failures
 Typical causes of user process failures
Abnormal disconnection of user process
Loss of client due to user action
Addressing exception
 Prevention
Not much can be done by the DBA
 Recovery
 Intervention by the DBA is not normally required
 Aborted processes are detected by PMON which
performs a rollback of the active transaction and releases
all resources
 PMON cleans out all caches and frees up memory occupied by
the failed process
Protecting the Database from Failure
Network Failures
 Typical types of network failures
Network or phone connections lost
Client workstation becomes disconnected
 Recovery
 Intervention by the DBA is not normally required
 Loss of the connection is detected automatically by
PMON
 PMON forces a rollback of the client’s transaction
Protecting the Database from Failure
Instance Failures
 Typical causes of instance failure
Loss of an Oracle background process
 Prevention
Not much can be done
Monitor the trace files for clues to the cause
Recovery
 Intervention by the DBA is not normally required except
to shutdown and restart the database
 SMON process automatically applies redo log
information and rolls back all uncommitted changes on
instance startup
Protecting the Database from Failure
CPU (Host) Failure
 Typical causes of CPU failure
Loss of power
 Prevention
Utilize a UPS
Recovery
 Intervention by the DBA is not normally required except
to have already created scripts that restart the database
on computer bootup
 SMON process automatically applies redo log
information and rolls back all uncommitted changes on
instance startup
Protecting the Database from Failure
Media Failure – with no data loss
 Typical failures
Loss of disk controller
 Prevention
Utilize a UPS
Recovery
 Intervention by the DBA is not normally required except
to have already created scripts that restart the database
on computer bootup
 SMON process automatically applies redo log
information and rolls back all uncommitted changes on
instance startup
Protecting the Database from Failure
Media Failure – with data loss
 Typical failures
 Disk Head crash
 Disk or data destruction
 Fire
 Natural disasters
 Accidental datafile deletion
 Serious Failures
Requires DBA action
Requires plans and procedures to be in place
Recovery
 A strategy should be in place to handle all possible losses of data
 Data loss requires both rollforward and rollback procedures
 Same steps as instance recovery but probably from an earlier point
in time
 Need to establish which files are lost
 Backup method and type of files lost dictate the recovery method used
Protecting the Database from Failure
Media Failure – with data loss (further protections)
 Mirroring
 Some sites are able to use mirror disks
 Offers fault tolerance against media failure
 Cannot protect against user and software errors
 Mirrored Systems
Expensive
Whole systems mirrored for full fault tolerance
 Hot standby databases
 Oracle can keep a standby database in permanent recovery mode
 It is a clone of the target database
 Archived redo logs generated by the target are applied to the standby
database as they are produced
 RAID
 Allows the use of additional disks to protect database files
 Numerous configurations possible
Protecting the Database from Failure
Backup Strategies
 All files need to be protected
Datafiles
Redo Log Files
Control Files
A backup strategy must be developed
A choice of techniques is available to consider
Rehearsals should be carried out periodically of both
backup and recovery procedures
 Need to consider:
 The volatility of the database?
 Can users afford to lose any data?
 Can the database have planned downtime?
 How large is the database?
Protecting the Database from Failure
Archive Logging
 When active online redo log file is full, it is written
out to an archive log file
 Requires an additional background process
ARCH – Archiver
This is an optional process
 Archiving allows the following features:
Full recovery of the database to a state of consistency
as of the time of the crash
No loss of any committed data
Important for transaction-oriented volatile databases
 Oracle databases can run in either
 ARCHIVELOG mode or
 NOARCHIVELOG mode
Protecting the Database from Failure
Archive Logging
 ARCHIVELOG mode
 Allows full recovery of database
 Hot backups can be performed while the database
remains available
 Management overhead concerns because the archive
disk can become full
 Writing to tape can be slow so archive to disk first
 Highly active systems can fill redo log files faster than
the archive to tape process
 This will halt the instance until the archiver catches up
Protecting the Database from Failure
Archive Logging
 NOARCHIVELOG mode
 Allows the fastest performance of database
 Advantages
 Redo log files are cyclically reused because no archive logs
are produced
 No management overhead
 Mass updates possible without fear of filling up the archive disk
 Disadvantages
 Cold, whole database backup only allowed
 Damaged tablespaces stop the instance until dropped or
restored
 Recovery from media failures is generally not possible
 Recovery Options
 Restore from a previous backup
 Restore from a previous export using the import utility
Protecting the Database from Failure
Switch to Archive Logging
 The database can be switched to ARCHIVELOG
mode after creation
 Steps:
Shutdown Normal
Startup Mount db1
Alter Database db1 Archivelog;
Alter Database Open;
 The database is placed into ARCHIVELOG mode
while in the MOUNTed state
 This does not automatically enable archiving
Protecting the Database from Failure
Switch to Archive Logging
 When a database is put into ARCHIVELOG mode
the redo log files are not automatically archived
 Two methods to turn on automatic archiving
 Use the parameter file to enable permanent automatic
archiving
 Log_Archive_Start = True (False is the default)
 Or while the instance is running issue:
 Alter System Archive Log Start;
 This starts automatic archiving but does not survive a
shutdown and startup
 Remember this only activates automatic archiving if the
database is already in ARCHIVELOG mode
 Stopping automatic archiving:
 Alter System Archive Log Stop;
Protecting the Database from Failure
Setting a destination for Archive Log Files
 The Log_Archive_Dest parameter
 Log_Archive_Dest = d:\archive
 This can be a single point of failure so be careful
 Log_Archive_Dest_1 = d:\archive
 Log_Archive_Dest_2 = e:\archive
 Log_Archive_Dest_3 = e:\backup
 Log_Archive_Dest_4 = f:\archive
 Up to five destinations can be specified
 No single point of failure
Protecting the Database from Failure
Archive Summary Information
 Archive Log List;
 This statement generates the following output:
 Database log mode
 Automatic archival
 Archive destination
 Oldest online log sequence
 Next log sequence to archive
 Current log sequence number
Protecting the Database from Failure
Redo Log Management
 How many?
 How big?
 Where are they stored?
 How are they protected/recovered?
 What do they contain?
 How will they be used?
 Which mode of operation?
Protecting the Database from Failure
Redo Log Management
 Minimum of two log files are required
Redo log files are used in a cyclic nature
Should use enough log groups to allow archiving to
take place before log files need to be reused
 Default size is 500 KB
Minimum size is 50 KB
There is no maximum size
Usual size is 1 - 10 MB
All log files should be the same size
 Should be positioned on separate disks
Protecting the Database from Failure
Redo Log Management
 Should be duplexed (Mirrored)
 Oracle is vulnerable to the loss of redo logs
 Database stops and full recovery is impossible
 Redo logs contain both before and after images of
changed data
Redo logs provide protection for both data and rollback segments
to allow total recovery
 Redo logs are used to recover from instance and media
failures
Can be applied to restored backup copies of the database files
Database changes are written to redo logs on transaction commits
When redo log is full, a log switch and checkpoint occurs
 Redo logs can be archived as they are filled
Protecting the Database from Failure
Redo Log File Switches
 A checkpoint occurs at each switch and each new
log is assigned a sequence number
Redo Log Group 1
Log 1
Redo Log Group 2
Archive Redo Logs
LGWR
ARCH
Log 1
LGWR
1
Log 2
ARCH
Log 3
LGWR
2
Log 2
ARCH
Log 3
LGWR
1
3
Log 4
2
1
Protecting the Database from Failure
Redo Log Group
 Number the groups 1, 2, 3 not 10, 20 , 30
Takes up less space in the control file which references the redo
log files
 Use just enough groups to prevent waits for checkpoints
and archive operations
 If Oracle loses a member of a group, it is recorded as
STALE and an error is recorded in the ALERT file and
LGWR trace file
System is not halted, LGWR continues to write to the other
members of the group
 The archiver can archive any one member of a group
 If the log file is unavailable archiver simply chooses an alternative
 LGWR cannot reuse a log file currently being archived
Protecting the Database from Failure
Redo Log Group
 Adding a new redo log group
 Alter Database db_name Add Logfile
Group x ‘filespec’ Size n (K | M);
 Adding a new redo log member
 Alter Database db_name Add Logfile Member
‘filespec’ To Group n;
 Dropping Redo Log Groups
 Alter Database db_name Drop Logfile Group n;
 At least three log file groups must be present
 Must have at least two for LGWR to function
Protecting the Database from Failure
Managing and Protecting Control Files
 Control Files
 Relatively small binary files that are critical to the
operation of a database
 Contain database definitions:
 Database name
 Names and locations of redo log files and datafiles
 Current log sequence numbers and status
 Checkpoint timestamp information
 Database creation timestamp
 Beginning and ending blocks of rollback segments
 Information required by the Recovery Manager
 All control files must be available while the database is
open
 A database cannot be opened if a control file is lost
Protecting the Database from Failure
Control Files: Current Advice
 Store at least two copies on separate disks
 Better to have a copy on each disk
 Oracle mirrors any changes to the control files on all
copies
 Control files are updated at each checkpoint with a slight
impact on performance
 If you have only one disk available place the control
files in different areas to avoid problems from bad sectors
 Backup the control file whenever a change is
made to database or redo log files
 If primary control files are lost than a full recovery is
possible from backup control files
Protecting the Database from Failure
Adding a New Control File
 Shutdown the database
 Copy the existing (good) control file to a file(s) on
another disk(s) using the operating system
 Specify the new control file name in the
CONTROL_FILES parameter in the parameter file
 Start up the instance and open the database
using the changed parameter file
Protecting the Database from Failure
Safeguarding Control Files
 Perform a control file backup each time a
database change is made
Alter Database Backup Controlfile To ‘filespec’;
 Trace out the control file to produce a humanreadable file
 Alter Database Backup Controlfile To Trace;
 This produces a file in the trace directory that can be
read, edited and used to reconfigure the control file if
needed
Protecting the Database from Failure
Loss of a Control File
 If any control files are lost, the database cannot
function correctly
 Shutdown the database (may require a Shutdown
Abort)
Using the OS make a copy of the good control file but
not in the same location as the problem control file
 Specify the location of the new control file in the
parameter file and remove reference to the bad
control file
 Start up the database and return to normal
function
Protecting the Database from Failure
Loss of All Control Files
 Serious Problem
Call Oracle Support
 Media recovery is likely necessary
 Can restore from a backup and then use an
appropriate backup control file to recover the
database
 It is sometimes possible to re-create a control file
using the CREATE CONTROLFILE statement
Protecting the Database from Failure
The Alert Log File
 Written by Oracle and stored in
oraclehome/admin/oracle_sid/bdump directory
 Contains information on
 Database creation
 Startup of each background process
 Log switches
 Datafile and logfile manipulation
 Internal errors
 Block corruption
 Deadlocks
 The file grows without bound
 Must decide what to keep online and offline
Protecting the Database from Failure
Questions?