BNL_migration_CERNOV2009

Download Report

Transcript BNL_migration_CERNOV2009

Migration
BNL Conditions DB to new cluster
Carlos Fernando Gamboa
Grid Group, RACF Facility
Brookhaven National Laboratory, US
Distributed Database Operations Workshop
CERN, Geneva. November 2009
Goals
- To minimize Conditions Database service disruption while
performing the migration of hardware.
- To upgrade the existing hardware capacity installed for the
Conditions Database services at BNL.
- To upgrade the underlying Operative System from RHEL 4
ES to RHEL 5 Server.
- To test / use Oracle’s Dataguard tool and Transportable Table
spaces.
Distributed Database Operations Workshop
11/27/09
General considerations
- Only disk backups available.
- Minimum size of backup set to 500GB (size of the biggest data
file).
- 1.2TB of data to be migrated.
Distributed Database Operations Workshop
11/27/09
Scenario 1: Dataguard
Preparation:
- Oracle documentation, CERN DB twiki.
Please see Jacek’s presentantion
http://indicobeta.cern.ch/getFile.py/access?contribId=13&sessionId=2&resId=1&materialId=slides&confId=43856
- Test successfully done with test database LFC and FTS DB
replica from BNL production (~200GB )
-Installation of RDBMS in new cluster done using cloning.
Distributed Database Operations Workshop
11/27/09
Scenario 1: Dataguard
- Enabling Dataguard specific parameters in the initialization file. The backup of the old
db is restored and recovered in the new hardware.
- Dataguard specific configuration steps (standby redo logs) will need to be done to fully
enable the manage recovery process.
- Per successful applying of the Redo logs, prepare network environment and close
database user access to start the switch of the role of the current primary DB (old) to
standby.
-The database in standby (new) will be switch to have the primary role. As soon as the
(new) DB has this role the database can be opened.
- Clean the new db by disabling Dataguard related parameter not longer needed.
Distributed Database Operations Workshop
11/27/09
Scenario 1: Dataguard
General consideration
- Only disk backups available
Restore and Recovery
Standby controlfile
- 1.2TB of data to be migrated (TAGS, COND. DB)
CRS+ASM
RDBMs, Cloning
Server 1
Server 2
DATA
(RAID 10)
1.35 TB
DS340
0
FRA
(RAID 10)
1.35TB
Old
Distributed Database Operations Workshop
FC
RR 5
RAID
1.7 TB
Ad-hoc
Backup
-Database force logging
-Incremental L0 taken day
before.
+
-Incremental L1 day
intervention + Standby
controlfile
FC
Server 1
Server 2
DS340
RAID 10
0
5 TB (storage size)
RAID 6 FRA
5 TB (storage size)
New
11/27/09
Scenario 1: Dataguard
Intervention programmed between 10:00-2:00PM
Started at 7AM.
Enabling Dataguard specific parameters in the initialization file. The backup of
the old db is restored and recovered in the new hardware.
-Problem encountered when restoring backup, ASM instance in the new
DB
crashed with error, 1 hour understanding the problem.
ORA-04031: unable to allocate bytes of shared memory asm
-Troubleshooting, error not observed before. Increase share pool size from
90MB to 500MB.
-Resize database disk backup set from 860GB (biggest backupset size) to
500GB. 1.2TB done in 2 hours 30 minutes.
-Restore done by groups of 5 datafiles at a time.
Worked BUT it took 3.30 minutes to restore the entire DB!
Although, this caused a delay closed to the end of the maintenance
window.
Not Distributed
databaseDatabase
serviceOperations
disruption
happened during this intervention.
Workshop
11/27/09
BUT,
LHC (3D, ATLAS,CMS,LHCB,ALICE,IT-DM…) DBAs
Distributed Database Operations Workshop
11/27/09
Scenario 2: Transportable tablespaces (TT)
A tool that allows to pull datafiles from tablespace among Databases. In addition, and import of the
tablespace metadata is required to enable the tablespace into the database being populated with the
copy.
-Specific details of this technology please see Eva’s presentation:
http://indicobeta.cern.ch/conferenceOtherViews.pyview=standard&confId=6552
General considerations
-Selection of the tablespaces to be migrated can be done. Conditions Database data migrated, straight
forward service isolation between service (TAGS, COND. DB).
-Tablespaces to be used as a source need to be on read only mode.
-Disabling stream process at the source DB. BNL conditions DB (production).
-Can be done without affecting user service. No downtime is required
-Successfully used with TAIWAN folks when helping on their recovery.
Distributed Database Operations Workshop
11/27/09
Scenario 2: Transportable tablespaces (TT)
Preparation:
-Re-deployment of the CRS+RDBMS binaries. Apply latest
CRS and PSU patches.
-Isolate Conditions database service from direct connection
to the DB. DB name changed. Important milestone achieved!
-Disable ASMM.
Cool Schemas recreation, roles, profiles.
Database Services: Keep the current production connector
descriptor.
Distributed Database Operations Workshop
11/27/09
Scenario 2: Transportable tablespaces (TT)
Preparation day intervention and switch over:
- Split stream replication to BNL (Done at Tier0).
- Stop capture, propagation (Done at Tier0)
- Stop apply process (Done at BNL)
- Proceed with TT steps (copy datafiles and import metadata in
the target database)
- Migration of the production VIP to new nodes in rolling
fashion.
- Enabled replication (capture, propagation and apply)
Distributed Database Operations Workshop
11/27/09
Test results:
Past week was exercised the entire procedure using the TT.
When using pushing the current biggest Conditions datafile via
network (40 minutes to move ~170GB, biggest datafile)
data file. About 90 minutes to move the entire
ATLAS_COOL%, datafiles.
Distributed Database Operations Workshop
11/27/09
Conclusion
Two technologies to migrate the current conditions DB were
exercised and tested.
For this special case Transportable Tablespaces will be used to
perform the migration of the Conditions DB to the new
hardware.
Distributed Database Operations Workshop
11/27/09
Many thanks!
CERN IT-DM
Eva Dafonte, JacekWojcieszuk
BNL GCE group
Distributed Database Operations Workshop
11/27/09