_____________________________________________________________________________________________________________________
Error Description
Cluster is
up and running in one node but when I try to start the cluster in second node
gives following error messages.
crsctl check crs command
failed with following error messages.
[root@Node2 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster
Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
Also crsctl start res
ora.cluster_interconnect.haip -init Command failed with
CRS-2674: Start of 'ora.cssd' on 'Node2' failed
[root@Node2 ~]# crsctl start res ora.cluster_interconnect.haip
-init
CRS-2672: Attempting to start 'ora.cssd' on 'Node2'
CRS-2672: Attempting to start 'ora.diskmon' on 'Node2'
CRS-2676: Start of 'ora.diskmon' on 'Node2' succeeded
CRS-2674: Start of 'ora.cssd' on 'Node2' failed
CRS-2679: Attempting to clean 'ora.cssd' on 'Node2'
CRS-2681: Clean of 'ora.cssd' on 'Node2' succeeded
CRS-4000: Command Start failed, or completed with errors.
Solution Description
When I checked the occsd.trc file I noticed that there are few repeated
lines mentioning like has a disk HB, but no network HB. Now I realized that
there are some network issues and when I pinged the interconnect/private IPs
between the RAC nodes it is not pinging.
Log/Trace File: /u01/app/grid/diag/crs/Node2/crs/trace/ocssd.trc
2016-09-23 08:22:06.378508 :
CSSD:2643093248: clssscWaitOnEventValue: after CmInfo State val 3, eval 1 waited 1000 with cvtimewait
status 4294967186
2016-09-23 08:22:06.636082 :
CSSD:2620225280: clssnmPollingThread: state(1) clusterState(0) exit
2016-09-23 08:22:06.636092 :
CSSD:2620225280: clssscExit: removeNode() already called
2016-09-23 08:22:06.636095 :
CSSD:2620225280: clssscExit: abort already set 0
2016-09-23 08:22:06.742376 :
CSSD:2623379200: clssnmvDHBValidateNCopy: node 1, Node1, has a disk HB, but no
network HB, DHB has rcfg 370075968, wrtcnt, 966100, LATS 16055504,
lastSeqNo 966099, uniqueness 1474627759, timestamp 1474633343/1310023344
2016-09-23 08:22:07.378630 :
CSSD:2643093248: clssscWaitOnEventValue: after CmInfo State val 3, eval 1 waited 1000 with cvtimewait
status 4294967186
2016-09-23 08:22:07.743076 :
CSSD:2623379200: clssnmvDHBValidateNCopy: node 1, Node1, has a disk HB, but no
network HB, DHB has rcfg 370075968, wrtcnt, 966101, LATS 16056504,
lastSeqNo 966100, uniqueness 1474627759, timestamp 1474633344/1310024344
2016-09-23 08:22:08.378747 :
CSSD:2643093248: clssscWaitOnEventValue: after CmInfo State val 3, eval 1 waited 1000 with cvtimewait
status 4294967186
2016-09-23 08:22:08.743849 :
CSSD:2623379200: clssnmvDHBValidateNCopy: node 1, Node1, has a disk HB, but no
network HB, DHB has rcfg 370075968, wrtcnt, 966102, LATS 16057504,
lastSeqNo 966101, uniqueness 1474627759, timestamp 1474633345/1310025344
Tried to
ping the interconnect ip from node 1 to 2 it is not reachable and there was
some VLAN problems associated with that. Network team fixed the issue as per my
request and issue resolved.
[root@Node2 ~]# ping Node1priv.abnsayrate.net
PING Node1priv.abnsayrate.net (10.188.60.61) 56(84) bytes of
data.
From Node2priv.abnsayrate.net (10.188.60.62) icmp_seq=1 Destination Host
Unreachable
From Node2priv.abnsayrate.net (10.188.60.62) icmp_seq=2
Destination Host Unreachable
From Node2priv.abnsayrate.net (10.188.60.62) icmp_seq=3
Destination Host Unreachable
From Node2priv.abnsayrate.net (10.188.60.62) icmp_seq=4
Destination Host Unreachable
^C
--- Node1priv.abnsayrate.net ping statistics ---
7 packets transmitted, 0 received, +4 errors, 100% packet loss,
time 6000ms
pipe 4
[root@Node2 ~]# ^C
[root@Node2 ~]# ping 10.188.60.61
PING 10.188.60.61 (10.188.60.61) 56(84) bytes of data.
From 10.188.60.61 icmp_seq=1 Destination Host Unreachable
From 10.188.60.61 icmp_seq=2 Destination Host Unreachable
From 10.188.60.61 icmp_seq=3 Destination Host Unreachable
From 10.188.60.61 icmp_seq=4 Destination Host Unreachable
^C^C
--- 10.188.60.61 ping statistics ---
6 packets transmitted, 0 received, +4 errors, 100% packet loss,
time 5000ms
pipe 4
Now after
fixing the issue with the network I am able to ping the IP and able to start
the cluster thereafter.
[oracle@Node2 ~]$ ping 10.188.60.61
PING 10.188.60.61 (10.188.60.61) 56(84) bytes of data.
64 bytes from 10.188.60.61: icmp_seq=1 ttl=64 time=0.116 ms
64 bytes from 10.188.60.61: icmp_seq=2 ttl=64 time=0.049 ms
64 bytes from 10.188.60.61: icmp_seq=3 ttl=64 time=0.122 ms
^C
--- 10.188.60.61 ping statistics ---
Start the cluster
[root@Node2 ~]# crsctl start cluster
CRS-2672: Attempting to start 'ora.crf' on 'Node2'
CRS-2672: Attempting to start 'ora.cssd' on 'Node2'
CRS-2672: Attempting to start 'ora.diskmon' on 'Node2'
CRS-2676: Start of 'ora.diskmon' on 'Node2' succeeded
CRS-2676: Start of 'ora.crf' on 'Node2' succeeded
CRS-2676: Start of 'ora.cssd' on 'Node2' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'Node2'
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on
'Node2'
CRS-2676: Start of 'ora.ctssd' on 'Node2' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'Node2'
succeeded
CRS-2672: Attempting to start 'ora.asm' on 'Node2'
CRS-2676: Start of 'ora.asm' on 'Node2' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'Node2'
CRS-2676: Start of 'ora.storage' on 'Node2' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'Node2'
CRS-2676: Start of 'ora.crsd' on 'Node2' succeeded
_____________________________________________________________________________________________________________________
0 comments:
Post a comment