CRS-4638 CRS-4535 CRS-4530 CRS-4534 CRS-2674: Start of 'ora.cssd' on 'Node2' failed

Error Description
Cluster is up and running in one node but when I try to start the cluster in second node gives following error messages.
crsctl check crs command failed with following error messages.


[root@Node2 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager

Also crsctl start res ora.cluster_interconnect.haip -init Command failed with CRS-2674: Start of 'ora.cssd' on 'Node2' failed

[root@Node2 ~]# crsctl start res ora.cluster_interconnect.haip -init
CRS-2672: Attempting to start 'ora.cssd' on 'Node2'
CRS-2672: Attempting to start 'ora.diskmon' on 'Node2'
CRS-2676: Start of 'ora.diskmon' on 'Node2' succeeded
CRS-2674: Start of 'ora.cssd' on 'Node2' failed
CRS-2679: Attempting to clean 'ora.cssd' on 'Node2'
CRS-2681: Clean of 'ora.cssd' on 'Node2' succeeded
CRS-4000: Command Start failed, or completed with errors.

Solution Description
When I checked the occsd.trc file I noticed that there are few repeated lines mentioning like has a disk HB, but no network HB. Now I realized that there are some network issues and when I pinged the interconnect/private IPs between the RAC nodes it is not pinging.

Log/Trace File: /u01/app/grid/diag/crs/Node2/crs/trace/ocssd.trc

2016-09-23 08:22:06.378508 :    CSSD:2643093248: clssscWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 1000 with cvtimewait status 4294967186
2016-09-23 08:22:06.636082 :    CSSD:2620225280: clssnmPollingThread: state(1) clusterState(0) exit
2016-09-23 08:22:06.636092 :    CSSD:2620225280: clssscExit: removeNode() already called
2016-09-23 08:22:06.636095 :    CSSD:2620225280: clssscExit: abort already set 0
2016-09-23 08:22:06.742376 :    CSSD:2623379200: clssnmvDHBValidateNCopy: node 1, Node1, has a disk HB, but no network HB, DHB has rcfg 370075968, wrtcnt, 966100, LATS 16055504, lastSeqNo 966099, uniqueness 1474627759, timestamp 1474633343/1310023344
2016-09-23 08:22:07.378630 :    CSSD:2643093248: clssscWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 1000 with cvtimewait status 4294967186
2016-09-23 08:22:07.743076 :    CSSD:2623379200: clssnmvDHBValidateNCopy: node 1, Node1, has a disk HB, but no network HB, DHB has rcfg 370075968, wrtcnt, 966101, LATS 16056504, lastSeqNo 966100, uniqueness 1474627759, timestamp 1474633344/1310024344
2016-09-23 08:22:08.378747 :    CSSD:2643093248: clssscWaitOnEventValue: after CmInfo State  val 3, eval 1 waited 1000 with cvtimewait status 4294967186
2016-09-23 08:22:08.743849 :    CSSD:2623379200: clssnmvDHBValidateNCopy: node 1, Node1, has a disk HB, but no network HB, DHB has rcfg 370075968, wrtcnt, 966102, LATS 16057504, lastSeqNo 966101, uniqueness 1474627759, timestamp 1474633345/1310025344

Tried to ping the interconnect ip from node 1 to 2 it is not reachable and there was some VLAN problems associated with that. Network team fixed the issue as per my request and issue resolved.

[root@Node2 ~]# ping Node1priv.abnsayrate.net
PING Node1priv.abnsayrate.net (10.188.60.61) 56(84) bytes of data.
From Node2priv.abnsayrate.net (10.188.60.62) icmp_seq=1 Destination Host Unreachable
From Node2priv.abnsayrate.net (10.188.60.62) icmp_seq=2 Destination Host Unreachable
From Node2priv.abnsayrate.net (10.188.60.62) icmp_seq=3 Destination Host Unreachable
From Node2priv.abnsayrate.net (10.188.60.62) icmp_seq=4 Destination Host Unreachable
^C
--- Node1priv.abnsayrate.net ping statistics ---
7 packets transmitted, 0 received, +4 errors, 100% packet loss, time 6000ms
pipe 4

[root@Node2 ~]# ^C
[root@Node2 ~]# ping 10.188.60.61
PING 10.188.60.61 (10.188.60.61) 56(84) bytes of data.
From 10.188.60.61 icmp_seq=1 Destination Host Unreachable
From 10.188.60.61 icmp_seq=2 Destination Host Unreachable
From 10.188.60.61 icmp_seq=3 Destination Host Unreachable
From 10.188.60.61 icmp_seq=4 Destination Host Unreachable
^C^C
--- 10.188.60.61 ping statistics ---
6 packets transmitted, 0 received, +4 errors, 100% packet loss, time 5000ms
pipe 4

Now after fixing the issue with the network I am able to ping the IP and able to start the cluster thereafter.

[oracle@Node2 ~]$ ping 10.188.60.61
PING 10.188.60.61 (10.188.60.61) 56(84) bytes of data.
64 bytes from 10.188.60.61: icmp_seq=1 ttl=64 time=0.116 ms
64 bytes from 10.188.60.61: icmp_seq=2 ttl=64 time=0.049 ms
64 bytes from 10.188.60.61: icmp_seq=3 ttl=64 time=0.122 ms
^C
--- 10.188.60.61 ping statistics ---

Start the cluster
[root@Node2 ~]# crsctl start cluster
CRS-2672: Attempting to start 'ora.crf' on 'Node2'
CRS-2672: Attempting to start 'ora.cssd' on 'Node2'
CRS-2672: Attempting to start 'ora.diskmon' on 'Node2'
CRS-2676: Start of 'ora.diskmon' on 'Node2' succeeded
CRS-2676: Start of 'ora.crf' on 'Node2' succeeded
CRS-2676: Start of 'ora.cssd' on 'Node2' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'Node2'
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'Node2'
CRS-2676: Start of 'ora.ctssd' on 'Node2' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'Node2' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'Node2'
CRS-2676: Start of 'ora.asm' on 'Node2' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'Node2'
CRS-2676: Start of 'ora.storage' on 'Node2' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'Node2'

CRS-2676: Start of 'ora.crsd' on 'Node2' succeeded
DBA Tips Data Pump Reference

0 comments:

Post a Comment

 

dba topics. Copyright 2011-16 All Rights Reserved | Site Map | Contact | Disclaimer | Google