Hello all,
I have a few questions about a medi server cluster we configured recently.
When we move the apllication (DB2/SAP in HACMP cluster) from one node to the other node, it takes NetBackup up to 40 minutes to register that the application moved and to update the media server.
I can see that in /usr/openv/volmgr/debug/daemon
application switch was at 15:05
node1, now inactive noticed the switch at 15:18
<snip>
15:13:46.321 [8323148] <2> parsePatchVersionString: theRest = ><
15:13:46.327 [8323148] <4> UpdateClusterActiveNode: node1 is not part of a cluster
15:13:46.347 [8323148] <2> SetApplicationClusterStatus: Started
15:13:46.381 [8323148] <4> SetApplicationClusterStatus: application cluster <clsuter> is active on <node1>
15:13:46.395 [8323148] <2> SetApplicationClusterStatus: Done
15:18:46.499 [8323148] <2> mm_getnodename: (0) hostname node1 (from cached_hostname)
15:18:46.827 [8323148] <2> retrieveLocalPatchVersion: Reading from /usr/openv/netbackup/version
15:18:46.827 [8323148] <2> parsePatchVersionString: parsing = >7.5.0.3
<
15:18:46.827 [8323148] <2> parsePatchVersionString: theRest = ><
15:18:46.827 [8323148] <4> UpdateClusterActiveNode: node1 is not part of a cluster
15:18:46.827 [8323148] <2> SetApplicationClusterStatus: Started
15:18:46.974 [8323148] <4> SetApplicationClusterStatus: application cluster <cluster> is NOT active on <node1>
15:18:46.983 [8323148] <2> SetApplicationClusterStatus: Done
15:23:47.180 [8323148] <2> mm_getnodename: (0) hostname node1 (from cached_hostname)
15:23:47.208 [8323148] <2> retrieveLocalPatchVersion: Reading from /usr/openv/netbackup/version
15:23:47.212 [8323148] <2> parsePatchVersionString: parsing = >7.5.0.3
<snip>
node2, now active noticed the switch at 15:45
15:40:45.941 [8716342] <2> parsePatchVersionString: theRest = ><
15:40:45.941 [8716342] <4> UpdateClusterActiveNode: node2 is not part of a cluster
15:40:45.941 [8716342] <2> SetApplicationClusterStatus: Started
15:40:45.963 [8716342] <4> SetApplicationClusterStatus: application cluster <cluster> is NOT active on <node2>
15:40:45.974 [8716342] <2> SetApplicationClusterStatus: Done
15:45:46.068 [8716342] <2> mm_getnodename: (0) hostname node2 (from cached_hostname)
15:45:46.068 [8716342] <2> retrieveLocalPatchVersion: Reading from /usr/openv/netbackup/version
15:45:46.068 [8716342] <2> parsePatchVersionString: parsing = >7.5.0.3
<
15:45:46.068 [8716342] <2> parsePatchVersionString: theRest = ><
15:45:46.068 [8716342] <4> UpdateClusterActiveNode: node2 is not part of a cluster
15:45:46.068 [8716342] <2> SetApplicationClusterStatus: Started
15:45:46.149 [8716342] <4> SetApplicationClusterStatus: application cluster <cluster> is active on <node2>
15:45:46.202 [8716342] <2> SetApplicationClusterStatus: Done
15:50:46.292 [8716342] <2> mm_getnodename: (0) hostname node2 (from cached_hostname)
15:50:46.292 [8716342] <2> retrieveLocalPatchVersion: Reading from /usr/openv/netbackup/version
15:50:46.292 [8716342] <2> parsePatchVersionString: parsing = >7.5.0.3
So it took NetBackup 40 minutes to recognise the switch. That seems a little bit to long?
Or is that normal behavior? What criteria does netbackup use to determine if the cluster is active on one node?
In addition to that I get a error message in this logs right after the switch:
<8> file_to_cache_item: [vnet_addrinfo.c:6555] fopen() failed ERRNO=112 FILE=/usr/openv/var/host_cache/073/73053c73+0,1,41,0,1,0+10.1.30.190.txt
<8> file_to_cache_item: [vnet_addrinfo.c:6555] fopen() failed ERRNO=112 FILE=/usr/openv/var/host_cache/0a8/747370a8+0,1,41,0,1,0+10.10.30.190.txt
Is this something to worrry about? The IP adress 10.1.30.190 is the cluster IP, that switched with the cluster app.
Maybe there is a way to include the NetBackup UpdateClusterActiveNode Commands into the scripts that run during the switch, to speed up things?
Thanks in advance
Volker