Transcript title

Implementing a logical
Data Guard
Steps to get there and tips & tricks
Mathijs Bruggink
The Netherlands 2012.09.13
Workshop Overview
▶ App 1 Hr
▶ General Information Dataguard Databases
▶ Setting Up a Logical Standby Dataguard
▶ Issues/ Tips and Trics
▶ The End
About Me.
• Oracle Dba since 1998 (since Oracle 7.3.4).
• Working for Atos since 2008.
• Key activities: Migrations; Real Application
Clusters; High Availability; Rman.
• Blogging
• http://jointherebels.nl/blog/rac.html
• http://www.orapro.nl/author/mathijs/
• Twitter : http://twitter.com/MathijsBruggink
• Travelling , Bonsais, Music, Married.
Purposes for a Dataguard Database (aka Standby Db) 1
Purposes for a Dataguard Database 2 nd Attempt
Purposes for a Dataguard For Real…..
▶ High availability for your Database.
▶ Cloning for example for Migration strategies.
▶ Part of Migration.
▶ Reporting (offers near time up to date environment).
▶ Mix of all the above.
The types of Dataguard databases
▶ Physical standby database
Every Logical
SB = born as
Physical SB
▶ Physically identical copy of the primary database.(Disk Structures
identical to the Primary Db on a block-for-block basis).
▶ Synced: A Physical standby database is synced with the primary
database, through Redo Apply, which recovers the redo data
received from the Primary side and applies the redo to the physical
standby database.
▶ Status: MOUNTED (NOTE: as from11GR1 ACTIVE Dataguard (DB
is Open Readonly)
▶ Logical standby database
▶ Logically: SAME Logical Info as the primary side. BUT the
physical organization and structure of the data can be different.
▶ Synced: synchronized through SQL Apply. This transforms the
data in the redo received from the Primary database into SQL
statements and then executes the SQL statements on the standby
database (Logminer).
▶ Status: OPEN
The types of Dataguard databases
▶ Snapshot Standby Database
This is a fully updatable standby database created by converting
a physical standby database into a snapshot standby database.
▶ Status: OPEN
NOTE : Unlike a physical or logical standby database, a snapshot
standby database does not apply the received redo data. It is only
applied after the snapshot standby is converted back into a
physical standby database.
The Future of Dataguard databases Addressing new Challenges:
• Zero Downtime for High
Availability, Migrations.
▶ And then there will be …… • Protecting your (changing)
Oracle Goldengate …………. datamodel by def. rules.
Interfacing with Dataguard
• Sqlplus
• Broker ( cmnd line interface )
• Grid Control
Data Guard Architecture (AND High Available)
Data Guard Architecture
Data Guard Protection Modes
▶ Maximum Performance
– Keyword: P1 Performance (of the Primary Db) over data protect.
It requires ASYNC redo transport, the LGWR process never
waits for acknowledgment from the Standby database.
▶ Maximum Availability
– Keyword: P1 = Availability: Zero data loss protect as a very
close P2. It requires SYNC redo transport, thus Primary
database performance may be impacted by the amount of time
required to receive an acknowledgment from the Standby that
redo has been written to disk.
▶ Maximum Protection
– Keyword: Utmost Prio 1 on data protection. It also requires
SYNC redo transport. The Primary Db will not acknowledge a
commit to the application unless it receives acknowledgment
from at least one Standby Db in the configuration that the data
needed to recover that transaction is safely on disk.
ASYNC redo transport architecture (Maximum Performance)
Log Network Server (LNS) Remote File Server (RFS)
SYNC redo transport architecture (Maximum Availability ; Maximum Protection)
Log Network Server (LNS) Remote File Server (RFS)
Automatic update in a Physical SB
Automatic update of Logical SB
SQL Apply in a Logical SB // Tuning
Bug if set to
4096 ( 4M)
execute dbms_logstdby.apply_set('MAX_SERVERS', 64);
execute dbms_logstdby.apply_set('MAX_SGA', 4095);
execute dbms_logstdby.apply_set('_HASH_TABLE_SIZE',
10000000);
Logical
Change
Record
Redo Merging (scn)
exec dbms_logstdby.apply_set ('PRESERVE_COMMIT_ORDER', 'TRUE');
Automatic gap resolution for Standby Databases
Apply
Missing archives
Not only gap resolution also, protecting needed archives on
primary side ! (RMAN-08137: WARNING: archived log not
deleted as it is still needed). BTW Archives are ALWAYS shipped
(even when up to date)
Best Practice before start 1
▶
Prepare the Primary Database:
– Application and Admin software.
– Are all required users present.
– Data complete as possible (Initial Imports etc)
– Bottomline:
BE as complete and thorough as possible before start with
the Duplication of the Database.
▶
Storage:
Plenty space in ASM on primary side FRA (Flash recovery
Area). NEED even more space on the Standby side.
Recommend: At least 1.5 times archive space on the LSB side
(Backlog is very likely. Sql apply and Oracle will only release
the data (even though automatically after 24 Hrs).
Best Practice before start 2
▶
Success
Backup/restore facilities:
Is the Standby server able to restore backups that are created on the
primary server (remote access list in networker).
▶
Time window:
PLAN your activities well ! Best practice is to work with the backup, the
restore, recover and set up of the Standby in one day, or the very next day
max.
▶
Physical standby:
Recommend : Physical standby in place and up to date then convert to a
logical standby.
▶
Bulk Data(The SQL apply will have trouble with Monster tables):
– Create the duplicate Database (that becomes a Physical standby) AFTER you
have finished working with the monster table
– OR define skip rule before doing that kind of work, Instantiate the table, and
then undo the skip rule.
Preparations for setting up a Logical SB:
Key elements Are :
▶
▶
▶
Password file
Connectivity
Parameters of Interest to Data Guard
▶
Init.ora on both Dataguard environments point to spfile
in asm fully prepared with all settings :
Example :
cat $ORACLE_HOME/dbs/initM2MP1R11.ora =
spfile='+MYDBSB_FRA01/m2mp1r1/spfileM2MP1R1.ora'
Parameters of Interest to Data Guard
Of interest for Call OUT
Preparations for the duplication
▶ Duplication script using Backups:
Duplicate Database Script
Or use .. From Active Database
FromActiveDatabase.rtf
Time line of Actions
1
2
3
BACKLOG
1. Backup
2. Restore
3. Start Physical / Convert to Logical
Restoring extra archives if needed after restore
▶ As long as you still have your Physical SB you can do this on
the SB SIDE !!!!!! ( After converting it to a logical SB the
DBID will change !!!!)
Archive Restores for Physical SB ( If needed)
After duplicate completes
Add standby redologs (SRL)
▶ Best practice: add the standby redo logs to the primary.
▶ During the duplicate SRL already present on the standby
side !!!
▶ Formula: Number of standby redologs needed is :
(4 groups in an instance + 1 extra group) * 6 threads = 30
Standby redologs. (my env. Was 6 node Rac).
▶ DO NOT: add them to a specific thread. Oracle will sort out,
per instance to which SRL it would like to talk on the
standby side (so you should create them without a thread
assigned to it).
Adding standby redologs
▶ alter database add standby logfile
▶ group 25 ('+MYDBSB_FRA01') size
1000m,
<<<<<<<<<<<<<<
Note !
▶ group 26 ('+MYDBSB_FRA01') size 1000m,
Only
one member should be allocated :
▶ group 27 ('+MYDBSB_FRA01') size
1000m,
▶ group 28 ('+MYDBSB_FRA01') size 1000m,
A standby redolog is in no way same as a
▶ group 29 ('+MYDBSB_FRA01') size 1000m,
redolog. So no two members are
……
required.
▶ group 49 ('+MYDBSB_FRA01') size
1000m,
▶ group 50 ('+MYDBSB_FRA01') size 1000m,
▶ group 51 ('+MYDBSB_FRA01') size 1000m,
▶ group 52 ('+MYDBSB_FRA01') size 1000m,
▶ group 53 ('+MYDBSB_FRA01') size 1000m,
▶ group 54 ('+MYDBSB_FRA01') size 1000m;
Start / stop Recovery on the Physical SB side
▶ ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
USING CURRENT LOGFILE DISCONNECT FROM SESSION;
▶ Alertlog shows:
MRP0: Background Managed Standby Recovery process started
(MYDBSB1)
Fast Parallel Media Recovery enabled
Managed Standby Recovery starting Real Time Apply
Wed May 18 15:22:25 2011
parallel recovery started with 64 processes
Note. If you need to stop recovery on a physical:
▶ ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
CANCEL;
Checking the Physical SB database
▶ select * from v$dataguard_status;
▶ select * from v$archive_gap;
▶ Oracle provided script:
– (can be downloaded from metalink)
Physical standby Checks
Physical Standby Output
Checking the Physical SB database
▶ on Primary side:
alter system archive log current;
▶ on Standby side
Check your alertlog on the Standby
NOTE: If this SHOULD show shipment of an Arch for all instances.
Check again your connectivity / password files on the source side.
Checking the Physical SB database
On the Primary side:
Set pagesize 10000
select * from gv$archive_dest
where DEST_NAME='LOG_ARCHIVE_DEST_2‘
order by inst_id;
Log_archive_dest_2_NOT OK
On Physical standby side:
COL NAME FOR A13
COL VALUE FOR A20
COL UNIT FOR A30
SET LINES 122
SELECT NAME, VALUE, UNIT, TIME_COMPUTED
FROM V$DATAGUARD_STATS
WHERE NAME IN ('transport lag', 'apply lag');
Status =
Valid /
ERROR
Checking the Physical SB database
Showed during catchup :
NAME
VALUE
UNIT
TIME_COMPUTED
------- -------------------- ------------------------------ ----------------------------apply lag +00 12:30:48
day(2) to second(0) interval 20-JUN-2011
21:20:36
transport lag +00 14:10:17 day(2) to second(0) interval 20-JUN-2011
21:20:36
.. apply lag means 12:30:48 behind on applying ( so likely an archive is
missing)
.. transport indicates that shipment of redo from the primary to the
physical standby is behind +00 14:10:17
Checking the Physical SB database
SET LINES 132
SET PAGESIZE 9999
COL CLIENT_PID FORMAT A12
set numf 9999999
SELECT PID, PROCESS, STATUS, CLIENT_PROCESS,CLIENT_PID,
THREAD#, SEQUENCE# SEQ#,BLOCK#, BLOCKS
FROM V$MANAGED_STANDBY
Order by thread#;
This showed in specific detail:
PID PROCESS STATUS
CLIENT_P CLIENT_PID THREAD#
SEQ#
BLOCK# BLOCKS
------ ------- ------------ -------- ------------ -------- -------- -------- -------32414 MRP0 WAIT_FOR_GAP N/A
N/A
5
1488
0
0
So I restored 1488 for thread 5 !
Convert a Physical SB to Logical SB - 1
1.
Stop redo apply on the physical standby database:
alter database recover managed standby database cancel;
2.
On the PRIMARY side the LogMiner Dictionary needs to be build
(on the source side):
execute dbms_logstdby.build;
Note: this package will wait for all active transactions to finish.
We have had at least 2 occasions where either a maintenance
window
– Full ( Db down ), or
– Partially ( stopping specific daemons )
Was needed to finish the job… So you need to be aware of that .
Convert a Physical SB to Logical SB - 2
3.
▶
▶
▶
On the SB side following sql:
shutdown immediate;
startup mount ;
alter database recover to logical standby <NEWSID>;
<NEWSID> = Choose new name (db name from the SB db will also change.
If you are using spfile that the change is recorded in the spfile during this process.
AND this recover will also change the DBID!
Example: alter database recover to logical standby MYLSBDB1;
▶
▶
▶
▶
shutdown
startup mount;
alter database open resetlogs;
alter database start logical standby apply immediate;
BTW if you need to stop apply:
ALTER DATABASE STOP LOGICAL STANDBY APPLY;
Health Checking Logical Standby
Primary side
select * from v$dataguard_status;
select * from gv$archive_dest where dest_id = 2 order by INST_ID;
(looking for (no) errors)
Open three windows:
– a sql window to the primary datatabase
– and a unix window with a tail –f on the alertfile on the standby
database
– and a sqlplus window to the standby database
Check current sequence number of the standby redologs on the standby side
select sequence#, first_time, next_time, dict_begin, dict_end from
dba_logstdby_log order by sequence#;
On the primary side force a log switch for all Instance by issuing
alter system archive log current;
(Metalink) DG logical standby diag
Logical Standby Output
Health Checking Logical Standby
On the logical side
Check the alertlog.. the logminer should show activities like these:
….
▶ LOGMINER: Begin mining logfile for session 1 thread 6 …….
▶ LOGMINER: End mining logfile for session 1 thread 3 ………..
▶ LOGMINER: Begin mining logfile for session 1 thread 3……..
▶ Tue Sep 27 08:34:27 2011
▶ RFS LogMiner: Registered logfile
[+MYDBSB_FRA01/m2mp1r1/archivelog/2011_09_27/thread_6_seq_3748.
439.762942763] to LogMiner session id [1]
Conclusion: if all this works fine, then it is a job well done !
ORACLE provided script
/opt/oracle/admin/tools /dg_logical_diag.sql (Metalink)
Additional checks (KPI 1)
Verify that log apply services on the standby are currently
running. If the query against V$LOGSTDBY returns no rows
then logical apply is not running.
column status format a50 wrap
column type format a11
set numwidth 15
SELECT TYPE, STATUS, HIGH_SCN
FROM V$LOGSTDBY;
Additional checks (KPI 1) (output)
TYPE
STATUS
HIGH_SCN
----------- -------------------------------------------------- --------------COORDINATOR ORA-16116: no work available
10982415246160
ANALYZER ORA-16116: no work available
10982415509661
APPLIER
ORA-16121:
with commit
SCN
10982415246168
This applying
tells youtransaction
that the mining
engine
is 0x
waiting
to apply transaction
09fd.0af5a3ac
145.10.1094317 and that 2161 transactions
APPLIER
ORA-16124: transaction 11 23 562627 is waiting on 10982415249574
another transactionneed to be applied (first) before the barrier is satisfied
APPLIER
ORA-16124: transaction 541 2 892602 is waiting on 10982415249570
another transaction
APPLIER
ORA-16124: transaction 11 0 562308 is waiting on a 10982415249585
nother transaction
APPLIER
ORA-16124: transaction 405 20 658443 is waiting on 10982415249590
another transaction
READER
ORA-16127: stalled waiting for additional transact 10982415512190
ions to be applied
BUILDER
ORA-44604: Barrier synchronization on DDL with XID 10982415509700
145.10.1094317 (waiting on 2161 transactions)
PREPARER ORA-16116: no work available
10982415512134
PREPARER ORA-16116: no work available
10982415512123
PREPARER ORA-16116: no work available
10982415512190
PREPARER ORA-16116: no work available
10982415512160
Additional checks (KPI 2)
In order to check if apply is still working
set lines 166
set numf 99999999999999999;
SELECT NEWEST_SCN, NEWEST_TIME,APPLIED_SCN, APPLIED_TIME,
READ_SCN, READ_TIME
FROM DBA_LOGSTDBY_PROGRESS;
NEWEST_SCN NEWEST_TIME
APPLIED_SCN APPLIED_TIME
READ_SCN READ_TIME
------------------ ------------------- ------------------ ------------------- ----------------- ------------------10857770301722 16.06.2011 16:19:38
10857664034502 16.06.2011
03:09:49
10856081424810 15.06.2011 20:41:59
10857770706836 16.06.2011 16:22:11
10857664034502 16.06.2011
03:09:49
10856081424810 15.06.2011 20:41:59
Additional checks (KPI 3)
set lines 166
set numf 99999999999999999;
select thread# trd, sequence#,first_change#, next_change#,dict_begin beg,
dict_end end,to_char(timestamp, 'hh:mi:ss') timestamp,
(case when l.next_change# < p.read_scn then 'YES'
when l.first_change# < p.applied_scn then 'CURRENT'
else 'NO' end) applied
from dba_logstdby_log l, dba_logstdby_progress p
2
order by thread#, first_change#;
TRD
SEQUENCE#
FIRST_CHANGE#
NEXT_CHANGE#
BEG END TIMESTAM APPLIED
------------------ ------------------ ------------------ ------------------ --- --- ------- ------1
1900 10854889739662 10855983809311 NO NO 12:08:40
YES
1
1901 10855983809311 10855986222881 NO NO 01:11:39
YES
1
1902 10855986222881 10855993572127 NO NO 03:31:08
1
YES
1
1903 10855993572127 10856005620352 NO NO 08:19:07
YES
1
1904 10856005620352 10856011936221 NO NO 10:22:09
YES
Customizing a Logical Standby Database to Replicate or
workaround
▶ GOAL: Specify rules For example skip the replication of a
table or a set of tables at the logical standby database.
▶ NOTE: 100 % of the redo is always transferred to the
standby database. The skipping in this case applies to what
SQL Apply will actually process at the standby database
with that redo.
▶ HOW: DBMS_LOGSTDBY.SKIP to Skip Replication of
Table(s).
▶ ……. On the Standby side of course !
Case : Skipping a Table that caused the SQL apply to slow down / stop
▶ Our Case: slow to almost none progress in the SQL applies.
▶ After the duplication a 83 Million row table was created on the primary
side.
▶ SQL apply tried to bring that biggy across but that took forever with
little progress only. We contacted Oracle They came up with the
following approach.
– stop the logical standby apply:
alter database stop logical standby apply;
– execute skip statement:
exec
dbms_logstdby.skip('DML',‘SCOTT','T_USAGE_DATA_ARCHIV
E',null);
– restart the logical standby apply
alter database start logical standby apply;
Identify Skiprules in place
col error format a10
col statement_opt format a20
col owner format a10
col name format a40
col proc format a30
set lines 144
select * from dba_logstdby_skip
order by statement_opt;
Instantiate a Table 1 (cause then we felt sorry for skipping)
▶ Right after the duplication we had issues with a large archive
table that was filled with 80 Million rows just over night , and the
sql apply had hard times catching up with that , slowing the
complete sql apply down dramatically. After consulting with the
Supplier and the Functional department it was decided to define a
skip_rule for thist table. After that the sql apply was able to catch
up easily..
▶ After time it turned out that the table was still needed on the
standby side. So we needed to explore the Instantiate table
procedure that Oracle offers. (Step by Step Approach of How to
Instantiate a Table in Logical Standby Database [ID
842160.1])
Instantiate a Table 2
Best Practice …
▶ MAKE DOUBLE SURE YOUR UNDO TABLESPACE IS LARGE
ENOUGH BEFORE EVEN STARTING.
▶ Even though it is not that documented specifically make sure that
the Schema that will have the instantiated table has following
extra privileges :
▶ SQL> grant SELECT_CATALOG_ROLE to scott;
SQL> grant EXP_FULL_DATABASE to scott;
SQL> grant IMP_FULL_DATABASE to scott;
Instantiate a Table 3
▶ 1. At primary, as sys user grant below roles to the user whose
table(s) is being reinstantiated
▶ SQL> grant SELECT_CATALOG_ROLE to scott;
SQL> grant EXP_FULL_DATABASE to scott;
SQL> grant IMP_FULL_DATABASE to scott;
▶
2. As sys user, create a connected user database link at logical
standby.
▶ SQL> create public database link link_cherry connect to scott
identified by tiger using 'cherry';
3. verify the database link from logical standby to ensure that it gets
info from primary
▶ SQL> select db_unique_name,database_role from
v$database@link_cherry;
DB_UNIQUE_NAME DATABASE_ROLE
------------------------------ --------------cherry PRIMARY
Instantiate a Table 4
▶ 4. stop Sql apply if it running. You can skip this step otherwise.
▶ SQL> select count(*) from v$logstdby; ## will result zero rows if
sql apply is not running
SQL> alter database stop logical standby apply;
if broker is in place
dgmgrl /
edit database 'standby db unique name' set state='LOG-APPLYOFF';
exit
▶
5. Remove the skip rules, if there is any, from the table that is
being instantiated
▶ SQL> select STATEMENT_OPT,NAME from DBA_LOGSTDBY_SKIP
where OWNER='SCOTT';
SQL> exec dbms_logstdby.unskip('SCHEMA_DDL', 'SCOTT','TEST');
SQL> exec dbms_logstdby.unskip('DML', 'SCOTT','TEST');
Instantiate a Table 5
6. From logical standby ,start instantiating the the table as sys user in sqlplus.
▶ exec dbms_logstdby.instantiate_table('SCOTT','TEST','LINK_CHERRY')
▶ Note:- Ensure that the Schema Name, Table Name and dblink name are in
Uppercase. This procedure recreates the table that is being instantiated. It can
be confirmed from dba_objects.created column of logical standby
7. Once you are done with instantiation, resume sql apply
▶ SQL> alter database start logical standby apply;
if broker is in place
dgmgrl /
edit database 'standby db unique name' set state='ONLINE';
exit
▶
8.At primary, Revoke the roles granted to the application user - scott
▶ SQL> revoke SELECT_CATALOG_ROLE from scott;
SQL> revoke EXP_FULL_DATABASE from scott;
SQL> revoke IMP_FULL_DATABASE from scott;
On call Support - 1
▶ Both on the primary side and the standby side we try to
keep up a pool of candidate disks when ASM runs out of
space.
▶ Stop the apply on the standby side:
alter database stop logical standby apply;
▶ Add the disk(s)
▶ Start the apply on the standby side:
alter database start logical standby apply immediate;
On call Support - 2
Create tablespace or adding/Resizing Datafiles in a logical Standby.
▶ Standby side:
▶ Resizing a datafile is always possible
▶ Adding a datafile to a tablespace few extra steps (sqlplus)
– alter database guard standby;
– add the datafile to: (f.i.) ALTER TABLESPACE SYSAUX ADD
DATAFILE '+MYSBDB_DATA01' SIZE 1000 M);
– alter database guard all;
▶ Primary Side:
▶ Creating a tablespace on the primary did work well but caused Side
effects on the logical standby side.
▶ BEST PRACTICE could be:
– To define a skip rule on the standby database for that
tablespace.
– Then create the tablespace on the primary side and then
– Create the same tablespace on the standby side.
Where can we improve..
▶ Monitoring ( automated )
▶ Network traffic ( if needed )
▶ ….. And of course the Application..
Used Information
▶ Oracle® Data Guard Concepts and Administration
11g Release 1 (11.1)
Part Number B28294-03
▶ Oracle Data Guard 11g Handbook
▶ Metalink
▶ DataGuard (11gr1)
▶ DataGuard (11gr2)
And a big Thank you for your TIME!