Transcript SUMMERSKY

Looking Under the Hood at the
Oracle ClusterWare
OOW -2009
Murali Vallath
[email protected]
About me…
Independent Oracle Consultant - Summersky Enterprises
e-mail: [email protected]
Agenda
•
•
•
•
•
•
•
Architecture
ClusterWare Components
CSS Startup process..
Oracle ClusterWare
Debug../Troubleshooting
OCR
Q&A
Architecture
Network Switch
Public Network
Network Switch
Cluster Interconnect
ORADB1
ORADB2
VIP
SSKY1
ORADB3
VIP
SSKY2
VIP
SSKY3
ORADB4
VIP
SSKY4
IPC
IPC
IPC
IPC
Comm. Layer
Comm. Layer
Comm. Layer
Comm. Layer
Listeners | Monitors
Listeners | Monitors
Listeners | Monitors
Listeners | Monitors
-----------------------
-----------------------
-----------------------
-----------------------
Clusterware
Clusterware
Clusterware
Clusterware
Operating System
Operating System
Operating System
Operating System
SAN switch
SSKYDB
Shared Storage
© Summersky Enterprises LLC | Murali Vallath | Slide: 4
Cluster Manager
• Is a distributed kernel component that monitors whether cluster
members can communicate with each other
• Enforces rules of cluster membership
• Forms a cluster, adds members to a cluster and removes
members from a cluster
• Tracks which members in a cluster are active
• Maintains a cluster membership list that is consistent on all
cluster members
• Provides timely notification of membership changes
• Detects and handles possible cluster partitions
© Summersky Enterprises LLC | OOW 2008 | Murali Vallath | Slide: 5
Oracle Clusterware Components
• Cluster Synchronization Services (CSS)
• Cluster Ready Services (CRS)
• Event Manager (EVM)
• Oracle Cluster Registry (OCR)
• Voting Disk
• Virtual (IP)
• Cluster Interconnect
Oracle Clusterware
Network Switch
Public Network
Interconnect Switch
Cluster Interconnect
ORADB1
ORADB2
VIP
ORADB3
VIP
SSKY1
ORADB4
VIP
SSKY2
SSKY3
VIP
SSKY4
NM
CSS
GM
EVM
CSS
CSS
CSS
CRS
EVM
EVM
EVM
RACGIMON
CRS
CRS
CRS
OCR
SAN switch
OCR (registry)
CSS Voting Disk
CSSD
• Node Membership (NM)
– Checks the heartbeat across the various nodes
in the cluster every second
– Checks the voting disk to determine if there is a
failure on any other nodes in the cluster
• Group Membership (GM)
– Provides group membership services
– All clients that perform I/O operations register
with the GM; for example, the LMON, DBWR etc
© Summersky Enterprises LLC | OOW 2008 | Murali Vallath | Slide: 8
EVMD
• Event forwarding daemon process
• Propagates using Oracle notification service (ONS)
• Scans node callout directory and invokes callouts
• Started after CSSD is started.
• Communication bridge between CSS and CRS
© Summersky Enterprises LLC | OOW 2008 | Murali Vallath | Slide: 9
CRSD
•
•
•
•
Defines and manages resources
Resource profile is stored in OCR
CRS reads OCR to manage resources
Manages application resources
–
–
–
–
START
STOP
Manages Failover
Generates events during cluster state change
• Information from OCR is cached by CRS
• Communicates with RAGIMON
© Summersky Enterprises LLC | OOW 2008 | Murali Vallath | Slide: 10
Logging
• $ORA_CRS_HOME/log/<node name>
directory contains
– Clusterware alert log e.g.: <nodename.log>
– crsd – log files for CRS daemons
– cssd - log files for CSS daemons
– evmd – log files for EVM daemons
– racg – log files for node applications including
VIP and ONS
Clusterware log directory structure
crs
log
node
admin
evmd
client
cssd
racg
crsd
DEBUG
•
crsctl debug statedump crs
– Output gets appended to ORA_CRS_HOME/log/oradb4/crsd/crsd.log
•
crsctl debug statedump evm
– Output gets appended to ORA_CRS_HOME/log/oradb4/evmd/evmd.log
© Summersky Enterprises LLC | OOW 2008 | Murali Vallath | Slide: 13
DEBUG CRS Modules
Modules
Functions /description
CRSUI
User interface module
CRSCOMM
Communication module
CRSRTI
Resource management module
CRSMAIN
Main module/driver
CRSPLACE
CRS placement module
CRSAPP
CRS application
CRSRES
CRS Resources
CRSOCR
OCR interface/ engine
CRSTIMER
Various CRS related timers
CRSEVT
CRS - EVM/event interface module
© Summersky Enterprises LLC | OOW 2008 | Murali Vallath | Slide: 14
DEBUG CRS Modules
crsctl debug log crs “CRSTIMER:2”
crsctl debug log crs “CRSEVT:1”
crsctl debug log crs “CRSAPP:2”
DEBUG EVM Modules
Module Name
Function
EVMD
EVM deamon
EVMDMAIN
EVM main module
EVMCOMM
EVM communication module
EVMEVT
EVM event module
EVMAPP
EVM application module
EVMAGENT
EVM agent module
CRSOCR
OCR interface /engine
CLUCLS
EVM cluster /CSS information
© Summersky Enterprises LLC | OOW 2008 | Murali Vallath | Slide: 16
EVMD Check
D:\oracle\product\10.2.0\crs\BIN>evmwatch -A -t "@timestamp
@priority @name"
05-Dec-2007 22:06:48 200 sys.ora.evm.msg.user
05-Dec-2007 22:06:50 200 ora.ha.oradb5.ASM2.asm.imcheck
05-Dec-2007 22:06:50 200 ora.ha.oradb5.ASM2.asm.imup
05-Dec-2007 22:07:14 200 sys.ora.evm.msg.user
05-Dec-2007 22:07:21 200 sys.ora.evm.msg.user
05-Dec-2007 22:07:21 200 sys.ora.evm.msg.user
05-Dec-2007 22:08:15 200 ora.ha.SSKY2.SSKY2.inst.imcheck
05-Dec-2007 22:08:15 200 ora.ha.SSKY2.SSKY2.inst.imup
05-Dec-2007 22:09:26 200 ora.ha.oradb4.ASM1.asm.imcheck
05-Dec-2007 22:09:26 200 ora.ha.oradb4.ASM1.asm.imup
05-Dec-2007 22:10:17 200 ora.ha.SSKY.SSKY1.inst.imcheck
05-Dec-2007 22:10:17 200 ora.ha.SSKY.SSKY1.inst.imup
© Summersky Enterprises LLC | OOW 2008 | Murali Vallath | Slide: 17
EVMD Actions
Action
Priority
Function
Error
500
No response is received for the action sent
transition
300
The event is in a state change process. Normally the action is received
when a resource or service is initially started, stopped or failing
over.
Down
200
Indicates that the resource or service is currently down
running
300
Indicates that the service or resource is currently in execution state.
This state is normally seen in cluster services or applications
managed by the Oracle Clusterware for example ‘crs’
Up
200
Indicates that the service or resource specified is up.
Imstop
200
Indicates an HA service stop action
relocatefailed
300
Indicates an attempt to relocate a service or resource from one node to
another, however such relocation attempt failed. This action
normally follows other actions such as ‘imstop’ or
‘stopped’
stopped
300
Indicates that the application has completely stopped execution.
Cluster Verification
D:\oracle\product\10.2.0\crs\BIN>olsnodes -n -p -v -g -i
prlslms: Initializing LXL global
prlsndmain: Initializing CLSS context
prlsmemberlist: No of cluster members configured = 256
prlsmemberlist: Getting information for nodenum = 1
prlsmemberlist: node_name = oradb4
prlsmemberlist: ctx->lsdata->node_num = 1
prls_getnodeprivname: Retrieving the node private name for node = oradb4
prls_getnodeprivname: Private node name = oradb4-priv
prls_getnodevip: Retrieving the virtual IP for node = oradb4
prls_getnodevip: prsr_vpip_key_len = 281
prls_getnodevip: Opening the OCR key DATABASE.NODEAPPS.oradb4.VIP
prls_getnodevip: OCR key value length = 29
prls_getnodevip: Virtual IP = oradb4-vip.sumsky.net
prls_printdata: Printing the node data
oradb4
1
oradb4-priv oradb4-vip.sumsky.net
prlsmemberlist: Getting information for nodenum = 2
prlsmemberlist: node_name = oradb5
prlsmemberlist: ctx->lsdata->node_num = 2
prls_getnodeprivname: Retrieving the node private name for node = oradb5
prls_getnodeprivname: Private node name = oradb5-priv
prls_getnodevip: Retrieving the virtual IP for node = oradb5
prls_getnodevip: prsr_vpip_key_len = 281
prls_getnodevip: Opening the OCR key DATABASE.NODEAPPS.oradb5.VIP
prls_getnodevip: OCR key value length = 29
prls_getnodevip: Virtual IP = oradb5-vip.sumsky.net
prls_printdata: Printing the node data
oradb5
2
oradb5-priv oradb5-vip.sumsky.net
© Summersky Enterprises LLC | OOW 2008 | Murali Vallath | Slide: 19
prlsndmain: olsnodes executed successfully
OCSSD
• Spawned in init.cssd
• Exists in both vendor ClusterWare and non-vendor
ClusterWare environments
• Performs inter node health monitoring
• Performs RDBMS instance endpoint discovery
OCSSD – Reboot Causes
•
•
•
•
•
Network failure or latency between nodes
Problems writing or reading from the CSS voting disk
Lack of CPU resources
Problem with the executables
Mis-configuration of CRS
– Wrong network selected as private for CRS
– Placing the CSS vote file on a Netapp that’s shared over unreliable
or excessively loaded network
• Killing the ‘init.cssd fatal’ process or “ocssd”
process
• Unexpected failure of the OCSSD process
• Oracle bug
OPROCD
• Is a process monitor deamon that provides cluster
level I/O fencing
• This process is spawned in any non-vendor
ClusterWare environment (Exception: Windows)
• Replaces hangcheck timer module for Linux (post
10.2.0.4)
• Runs as root
• Locked in memory
• Failure causes reboot of system
OPROCD
• Accepts two parameters
– -t - timeout value
• OPROCD_DEFAULT_TIMEOUT
• Specifies time between executions (milliseconds)
• Defaults to 10000
– -m – margin
• OPROCD_DEFAULT_MARGIN
• Acceptable margin before reboot
• Defaults to 500
/etc/init.d/init.cssd
OPROCD
• Current values can be obtained using crsctl
– crsctl get css reboottime
– crsctl get css diagwait
OPROCD – Reboot Causes
• OS scheduler issues
• OS locked by another process
• Excessive loads
• Oracle bug
OCLSOMON
• Used in environments with CRS and vendor clusterware
• Helps in providing more diagnostics information to
vendors during node evections by flushing more
information to the log files.
• This process monitors the CSS daemon for hangs or
scheduling issues and can reboot a node if there is a hang.
• Registers with the SKGXN (ClusterWare layer) and CSS.
• Lightweight process runs every second and ensures CSS is
healthy
• During CSS hang, it calls local fence in init.cssd
OCLSOMON – Reboot Causes
• Reboots because CSS is hung
• When CSS is hung, or fails, clsomon will fail and
call LocalFence in init.cssd
• OS scheduler issues
• Excessive amounts of load
• Oracle bug
OCR
1
Level
Resource Name
SYSTEM
CSS
EVM
CRS
LANGUAGE
VERSION
ORA_CRS_HOME
OCR
2
DATABASE
DATABASES
ASM
NODEAPPS
VIP_RANGE
LOG
ONS_HOSTS
3
CRS
CUR (current)
HIS (history)
SEC (security)
© Summersky Enterprises LLC | OOW 2008 | Murali Vallath | Slide: 28
OCR
Public Interface
Cluster Interconnect
ORADB1
ORADB2
ORADB3
ORADB4
OCR Cache
OCR Cache
OCR Cache
OCR Cache
OCR Process
OCR Process
OCR Process
OCR Process
OEM Agent
OUI
srvctl
OCR
(repository)
Oracle Database
OEM Agent
Clusterware Not Starting
• Bad voting disk
• Corrupted OCR file
• Log directories full
• Oracle Bug
OCR Corruption
• Check CSSD log files
$ORA_CRS_HOME/log/oradb3/cssd/cssd.log
– Repeated attempts to CSS
– Not able to read OCR file
– OCR file locked on by other nodes
[root@oradb3 bin]# ocrcheck
[CSSD]2009-06-04 19:30:36.042 [1274124608] >TRACE:
clssnmRcfgMgrThread:
Status
of
Oracle
Cluster
Registry
is
as
follows
:
Local Join
Version 19:30:36.042 [1274124608]
:
2
[CSSD]2009-06-04
Total space (kbytes)
: aborted
306968
>WARNING:clssnmLocalJoinEvent:takeover
due to ALIVE node on Disk
Used space (kbytes)
:
12852
Available space (kbytes) :
294116
ID
: 658275539
Name
: /dev/raw/ocr1
• StopDevice/File
CRS
Device/File integrity check succeeded
• Repair
OCR
file
Device/File Name
: /dev/raw/ocr2
Device/File
integrity check succeeded
• ocrconfig -repair ocr
/dev/raw/ocr1
• Stop CRS
• •Repair
Mirrored
copybackup
of OCR check succeeded
Restore
from
OCR
Cluster
registry
integrity
• ocrconfig
-repair
• Repair
Mirrored copy
of OCRocrmirror /dev/raw/ocr2
References
• Oracle 10g RAC - Grid Services and
Clustering – Murali Vallath
• Metalink Note #’s 26579.1
QUESTIONS
ANSWERS
Join the RAC-SIG
@
www.oracleracsig.org
My Other Presentations
• Session S307890
– 12-OCT-2009 17:30 Room: 236
– Looking Under the Hood of Oracle ClusterWare
• Session S309238
– 13-OCT-2009 14:30 @ Hilton /Franciscan A/B
– Understanding Oracle 11g RAC for Developers
• Session S299961 (Power Session)
– 14-OCT-2009 13:45 Room: 308
– Exploiting Oracle Tools and Utilities to Monitor and
Test Oracle RAC
[email protected]
Thanks for
Listening
| Murali Vallath | Slide: 36