cancel
Showing results for 
Search instead for 
Did you mean: 

VCS ERROR V-16-2-13067. Resource became OFFLINE unexpectedly

Christian_Mark
Level 3
Partner Accredited Certified

Hi

 

The system is a two-node cluster running Windows 2003r2 SP2 Standard edition. VSFHA 5.0. 

 

For some time now, we have experienced that a Generic-Service cluster fails and trys to migrate to other server. My question is, why it fails, the failure comes out of the blue. It's at different times mostly at night or early morning.

 

Here is a snapshot of engine_A.txt:

  

2008/07/31 06:38:16 VCS ERROR V-16-2-13067 (NAVISIONSRV2) Agent is calling clean for resource(SG-Navision-Mount-P) because the resource became OFFLINE unexpectedly, on its own.
2008/07/31 06:38:16 VCS ERROR V-16-2-13067 (NAVISIONSRV2) Agent is calling clean for resource(SG-Navision-Mount-L) because the resource became OFFLINE unexpectedly, on its own.
2008/07/31 06:38:16 VCS ERROR V-16-2-13067 (NAVISIONSRV2) Agent is calling clean for resource(SG-Navision-Mount-O) because the resource became OFFLINE unexpectedly, on its own.
2008/07/31 06:38:16 VCS ERROR V-16-2-13067 (NAVISIONSRV2) Agent is calling clean for resource(SG-Navision-Mount-J) because the resource became OFFLINE unexpectedly, on its own.
2008/07/31 06:38:16 VCS ERROR V-16-2-13067 (NAVISIONSRV2) Agent is calling clean for resource(SG-Navision-Mount-I) because the resource became OFFLINE unexpectedly, on its own.
2008/07/31 06:38:16 VCS ERROR V-16-2-13067 (NAVISIONSRV2) Agent is calling clean for resource(SG-Navision-Mount-M) because the resource became OFFLINE unexpectedly, on its own.
2008/07/31 06:38:16 VCS ERROR V-16-2-13067 (NAVISIONSRV2) Agent is calling clean for resource(SG-Navision-Mount-K) because the resource became OFFLINE unexpectedly, on its own.
2008/07/31 06:39:33 VCS INFO V-16-2-13068 (NAVISIONSRV2) Resource(SG-Navision-Mount-O) - clean completed successfully.
2008/07/31 06:39:33 VCS INFO V-16-1-10307 Resource SG-Navision-Mount-O (Owner: unknown, Group: SG-Navision) is offline on NAVISIONSRV2 (Not initiated by VCS)
2008/07/31 06:39:33 VCS NOTICE V-16-1-10300 Initiating Offline of Resource Service-Navision (Owner: unknown, Group: SG-Navision) on System NAVISIONSRV2
2008/07/31 06:39:33 VCS INFO V-16-6-15004 (NAVISIONSRV2) hatrigger:Failed to send trigger for resfault; script doesn't exist
2008/07/31 06:39:34 VCS INFO V-16-2-13068 (NAVISIONSRV2) Resource(SG-Navision-Mount-P) - clean completed successfully.
2008/07/31 06:39:34 VCS INFO V-16-1-10307 Resource SG-Navision-Mount-P (Owner: unknown, Group: SG-Navision) is offline on NAVISIONSRV2 (Not initiated by VCS)
2008/07/31 06:39:34 VCS INFO V-16-6-15004 (NAVISIONSRV2) hatrigger:Failed to send trigger for resfault; script doesn't exist
2008/07/31 06:39:34 VCS INFO V-16-2-13068 (NAVISIONSRV2) Resource(SG-Navision-Mount-M) - clean completed successfully.
2008/07/31 06:39:34 VCS INFO V-16-1-10307 Resource SG-Navision-Mount-M (Owner: unknown, Group: SG-Navision) is offline on NAVISIONSRV2 (Not initiated by VCS)
2008/07/31 06:39:35 VCS INFO V-16-2-13068 (NAVISIONSRV2) Resource(SG-Navision-Mount-J) - clean completed successfully.
2008/07/31 06:39:35 VCS INFO V-16-6-15004 (NAVISIONSRV2) hatrigger:Failed to send trigger for resfault; script doesn't exist
2008/07/31 06:39:35 VCS INFO V-16-1-10307 Resource SG-Navision-Mount-J (Owner: unknown, Group: SG-Navision) is offline on NAVISIONSRV2 (Not initiated by VCS)
2008/07/31 06:39:35 VCS INFO V-16-6-15004 (NAVISIONSRV2) hatrigger:Failed to send trigger for resfault; script doesn't exist
2008/07/31 06:39:35 VCS INFO V-16-2-13068 (NAVISIONSRV2) Resource(SG-Navision-Mount-L) - clean completed successfully.
2008/07/31 06:39:35 VCS INFO V-16-2-13068 (NAVISIONSRV2) Resource(SG-Navision-Mount-K) - clean completed successfully.
2008/07/31 06:39:35 VCS INFO V-16-2-13068 (NAVISIONSRV2) Resource(SG-Navision-Mount-I) - clean completed successfully.
2008/07/31 06:39:35 VCS INFO V-16-1-10307 Resource SG-Navision-Mount-L (Owner: unknown, Group: SG-Navision) is offline on NAVISIONSRV2 (Not initiated by VCS)
2008/07/31 06:39:35 VCS INFO V-16-1-10307 Resource SG-Navision-Mount-I (Owner: unknown, Group: SG-Navision) is offline on NAVISIONSRV2 (Not initiated by VCS)
2008/07/31 06:39:35 VCS INFO V-16-1-10307 Resource SG-Navision-Mount-K (Owner: unknown, Group: SG-Navision) is offline on NAVISIONSRV2 (Not initiated by VCS)
2008/07/31 06:39:36 VCS INFO V-16-6-15004 (NAVISIONSRV2) hatrigger:Failed to send trigger for resfault; script doesn't exist
2008/07/31 06:39:36 VCS INFO V-16-6-15004 (NAVISIONSRV2) hatrigger:Failed to send trigger for resfault; script doesn't exist
2008/07/31 06:39:36 VCS INFO V-16-6-15004 (NAVISIONSRV2) hatrigger:Failed to send trigger for resfault; script doesn't exist
2008/07/31 06:40:11 VCS ERROR V-16-2-13064 (NAVISIONSRV2) Agent is calling clean for resource(Service-Navision) because the resource is up even after offline completed.
2008/07/31 06:40:11 VCS ERROR V-16-2-13069 (NAVISIONSRV2) Resource(Service-Navision) - clean failed.
2008/07/31 06:41:11 VCS ERROR V-16-2-13077 (NAVISIONSRV2) Agent is unable to offline resource(Service-Navision). Administrative intervention may be required.
2008/07/31 06:41:11 VCS INFO V-16-6-15004 (NAVISIONSRV2) hatrigger:Failed to send trigger for resnotoff; script doesn't exist
2008/07/31 06:47:17 VCS INFO V-16-1-50133 User admin has logged in from 127.0.0.1
2008/07/31 06:48:22 VCS INFO V-16-1-50135 User admin fired command: hares -online SG-Navision-IP  NAVISIONSRV3  from 127.0.0.1
2008/07/31 06:48:51 VCS INFO V-16-1-50135 User admin fired command: hagrp -flush ClusterService  NAVISIONSRV3  from 127.0.0.1
2008/07/31 06:49:11 VCS ERROR V-16-2-13079 (NAVISIONSRV2) Resource(Service-Navision): The last 10 invocations of the clean procedure have failed.
2008/07/31 06:49:12 VCS INFO V-16-1-50135 User admin fired command: hares -online SG-Navision-IP  NAVISIONSRV3  from 127.0.0.1
2008/07/31 06:54:28 VCS INFO V-16-1-50135 User admin fired command: hares -online Service-Navision  NAVISIONSRV3  from 127.0.0.1
2008/07/31 06:56:30 VCS INFO V-16-1-50135 User admin fired command: MSG_RES_PROBE Service-Navision  NAVISIONSRV2  from 127.0.0.1
2008/07/31 06:57:05 VCS INFO V-16-1-50135 User admin fired command: hares -offline SG-Navision-Mount-N  NAVISIONSRV2  from 127.0.0.1
2008/07/31 06:57:51 VCS ERROR V-16-1-11274 System shutting down on NAVISIONSRV2, HAD exiting..
2008/07/31 07:02:47 VCS NOTICE V-16-1-11022 VCS engine (had) started
2008/07/31 07:02:47 VCS INFO V-16-1-10196 Cluster logger started
2008/07/31 07:02:47 VCS NOTICE V-16-1-11050 VCS engine version=5.0
2008/07/31 07:02:47 VCS NOTICE V-16-1-11051 VCS engine join version=5.0.00.0
2008/07/31 07:02:47 VCS NOTICE V-16-1-11052 VCS engine pstamp=Veritas-5.0-12/07/2006-20:35:04
2008/07/31 07:02:47 VCS NOTICE V-16-1-10114 Opening GAB library
2008/07/31 07:02:47 VCS NOTICE V-16-1-10619 'HAD' starting on: NAVISIONSRV2
2008/07/31 07:02:48 VCS INFO V-16-1-10125 GAB timeout set to 15000 ms
2008/07/31 07:02:57 VCS INFO V-16-1-10077 Received new cluster membership
2008/07/31 07:02:57 VCS NOTICE V-16-1-10112 System (NAVISIONSRV2) - Membership: 0x3, DDNA: 0x0
2008/07/31 07:02:57 VCS NOTICE V-16-1-10322 System  (Node '1') changed state from UNKNOWN to INITING
2008/07/31 07:02:57 VCS NOTICE V-16-1-10086 System NAVISIONSRV2 (Node '0') is in Regular Membership - Membership: 0x3
2008/07/31 07:02:57 VCS NOTICE V-16-1-10086 System  (Node '1') is in Regular Membership - Membership: 0x3
2008/07/31 07:02:57 VCS WARNING V-16-1-50129 Operation 'haclus -modify' rejected as the node is in CURRENT_DISCOVER_WAIT state
2008/07/31 07:02:57 VCS WARNING V-16-1-50129 Operation 'haclus -modify' rejected as the node is in CURRENT_DISCOVER_WAIT state
2008/07/31 07:02:57 VCS WARNING V-16-1-50129 Operation 'hasys -modify' rejected as the node is in CURRENT_DISCOVER_WAIT state
2008/07/31 07:02:57 VCS NOTICE V-16-1-10453 Node: 1 changed name from: '' to: 'NAVISIONSRV3'
2008/07/31 07:02:57 VCS NOTICE V-16-1-10322 System NAVISIONSRV3 (Node '1') changed state from INITING to RUNNING
2008/07/31 07:02:57 VCS NOTICE V-16-1-10322 System NAVISIONSRV2 (Node '0') changed state from CURRENT_DISCOVER_WAIT to REMOTE_BUILD
2008/07/31 07:02:57 VCS NOTICE V-16-1-10464 Requesting snapshot from node: 1
2008/07/31 07:02:57 VCS NOTICE V-16-1-10465 Getting snapshot.  snapped_membership: 0x3 current_membership: 0x3 current_jeopardy_membership: 0x0
2008/07/31 07:02:58 VCS NOTICE V-16-1-10181 Group ClusterService AutoRestart set to 1
2008/07/31 07:02:58 VCS NOTICE V-16-1-10181 Group SG-Navision AutoRestart set to 1

1 ACCEPTED SOLUTION

Accepted Solutions

Christian_Mark
Level 3
Partner Accredited Certified

Thanks for your response. I did patch the systems to 5.0 RP1a. Which seems to have solved the issues with the MountV resources going offline.

 

However after the RP1a was installed the Java Cluster Manager GUI suddenly showed too many attributes. This was solved by applying http://support.veritas.com/docs/302583.

 

Everything seems to be working fine now.

View solution in original post

8 REPLIES 8

Captain_Veritas
Level 3
Employee

Christian,

 

Unfortunately, there are times that the Engine Log doesn't tell all.

In the same directory as the engine log, there are a number of other log files.

If I understood your preamble, NAVISIONSRV2 is a Generic Service resource?

There should be a Generice Services_A.txt file.

Please post this up to the forum, and we'll see if it can shed any further light on the matter.

 

As an aside, most, if not all, agents will have their <Agent Name>_A.txt on the log directory.

In Unix, they are all <Agent Name>_A.log

These are the output logs from the Agents themselves, and sometimes have revealing, if not rather obscure, messages that can help determine what's going on.  It can take a little preactice and experience to understand what the log file is trying to tell you, but it's at least one more place you can look when things go wonky and you can't figure out why. 

 

Christian_Mark
Level 3
Partner Accredited Certified

Looks like the error starts with the MountV resources. Here is output in log-files for GenericService and MountV: 

 

This is from the GenericService_A.txt

 

2008/07/31 06:40:11 VCS WARNING V-16-10051-6025 GenericService:Service-Navision:offline:The service 'NAVISIONSRV' did not stop within the specified timeout. Error = 258.
2008/07/31 06:40:11 VCS ERROR V-16-2-13064 Thread(3848) Agent is calling clean for resource(Service-Navision) because the resource is up even after offline completed.
2008/07/31 06:40:11 VCS WARNING V-16-10051-6023 GenericService:Service-Navision:clean:The service 'NAVISIONSRV' is not in running state. Attempt to stop it might be unsuccessful.
2008/07/31 06:40:11 VCS ERROR V-16-10051-6024 GenericService:Service-Navision:clean:The service 'NAVISIONSRV' did not stop. Error = 1061.
2008/07/31 06:40:11 VCS ERROR V-16-2-13069 Thread(3848) Resource(Service-Navision) - clean failed.
2008/07/31 06:41:11 VCS ERROR V-16-2-13077 Thread(3844) Agent is unable to offline resource(Service-Navision). Administrative intervention may be required.
2008/07/31 06:41:11 VCS WARNING V-16-10051-6023 GenericService:Service-Navision:clean:The service 'NAVISIONSRV' is not in running state. Attempt to stop it might be unsuccessful.
2008/07/31 06:41:11 VCS ERROR V-16-10051-6024 GenericService:Service-Navision:clean:The service 'NAVISIONSRV' did not stop. Error = 1061.

 

 

From the MountV_A.txt:

 

2008/07/31 06:38:16 VCS WARNING V-16-10051-9008 MountV:SG-Navision-Mount-P:monitor:Failed to get volume properties [19:1]
2008/07/31 06:38:16 VCS WARNING V-16-10051-9008 MountV:SG-Navision-Mount-L:monitor:Failed to get volume properties [19:1]
2008/07/31 06:38:16 VCS WARNING V-16-10051-9008 MountV:SG-Navision-Mount-O:monitor:Failed to get volume properties [19:18]
2008/07/31 06:38:16 VCS WARNING V-16-10051-9008 MountV:SG-Navision-Mount-J:monitor:Failed to get volume properties [19:18]
2008/07/31 06:38:16 VCS WARNING V-16-10051-9008 MountV:SG-Navision-Mount-I:monitor:Failed to get volume properties [19:18]
2008/07/31 06:38:16 VCS WARNING V-16-10051-9008 MountV:SG-Navision-Mount-M:monitor:Failed to get volume properties [19:1]
2008/07/31 06:38:16 VCS WARNING V-16-10051-9008 MountV:SG-Navision-Mount-K:monitor:Failed to get volume properties [19:1]
2008/07/31 06:38:16 VCS ERROR V-16-2-13067 Thread(4000) Agent is calling clean for resource(SG-Navision-Mount-P) because the resource became OFFLINE unexpectedly, on its own.
2008/07/31 06:38:16 VCS ERROR V-16-2-13067 Thread(5580) Agent is calling clean for resource(SG-Navision-Mount-L) because the resource became OFFLINE unexpectedly, on its own.

 

 

Christian_Mark
Level 3
Partner Accredited Certified

 

Further to the MountV_A.txt:

 

2008/07/31 06:38:16 VCS ERROR V-16-2-13067 Thread(4604) Agent is calling clean for resource(SG-Navision-Mount-O) because the resource became OFFLINE unexpectedly, on its own.
2008/07/31 06:38:16 VCS ERROR V-16-2-13067 Thread(5264) Agent is calling clean for resource(SG-Navision-Mount-J) because the resource became OFFLINE unexpectedly, on its own.
2008/07/31 06:38:16 VCS ERROR V-16-2-13067 Thread(3744) Agent is calling clean for resource(SG-Navision-Mount-I) because the resource became OFFLINE unexpectedly, on its own.
2008/07/31 06:38:16 VCS ERROR V-16-2-13067 Thread(6020) Agent is calling clean for resource(SG-Navision-Mount-M) because the resource became OFFLINE unexpectedly, on its own.
2008/07/31 06:38:16 VCS ERROR V-16-2-13067 Thread(4160) Agent is calling clean for resource(SG-Navision-Mount-K) because the resource became OFFLINE unexpectedly, on its own.
2008/07/31 06:38:23 VCS WARNING V-16-10051-9023 MountV:SG-Navision-Mount-P:clean:Failed to lock volume [2:5]
2008/07/31 06:38:23 VCS WARNING V-16-10051-9023 MountV:SG-Navision-Mount-M:clean:Failed to lock volume [2:5]
2008/07/31 06:38:23 VCS WARNING V-16-10051-9023 MountV:SG-Navision-Mount-J:clean:Failed to lock volume [2:5]
2008/07/31 06:38:23 VCS WARNING V-16-10051-9023 MountV:SG-Navision-Mount-L:clean:Failed to lock volume [2:5]
2008/07/31 06:38:23 VCS WARNING V-16-10051-9023 MountV:SG-Navision-Mount-I:clean:Failed to lock volume [2:5]
2008/07/31 06:38:23 VCS WARNING V-16-10051-9023 MountV:SG-Navision-Mount-K:clean:Failed to lock volume [2:5]
2008/07/31 06:38:23 VCS WARNING V-16-10051-9023 MountV:SG-Navision-Mount-O:clean:Failed to lock volume [2:5]
2008/07/31 06:39:33 VCS WARNING V-16-2-13140 Thread(4604) Could not find timer entry with id 82144
2008/07/31 06:39:33 VCS ERROR V-16-2-13068 Thread(4604) Resource(SG-Navision-Mount-O) - clean completed successfully.
2008/07/31 06:39:34 VCS WARNING V-16-2-13140 Thread(4000) Could not find timer entry with id 82142
2008/07/31 06:39:34 VCS ERROR V-16-2-13068 Thread(4000) Resource(SG-Navision-Mount-P) - clean completed successfully.
2008/07/31 06:39:34 VCS WARNING V-16-2-13140 Thread(6020) Could not find timer entry with id 82146
2008/07/31 06:39:34 VCS ERROR V-16-2-13068 Thread(6020) Resource(SG-Navision-Mount-M) - clean completed successfully.
2008/07/31 06:39:34 VCS WARNING V-16-2-13140 Thread(5264) Could not find timer entry with id 82145
2008/07/31 06:39:34 VCS ERROR V-16-2-13068 Thread(5264) Resource(SG-Navision-Mount-J) - clean completed successfully.
2008/07/31 06:39:35 VCS WARNING V-16-2-13140 Thread(5580) Could not find timer entry with id 82143
2008/07/31 06:39:35 VCS ERROR V-16-2-13068 Thread(5580) Resource(SG-Navision-Mount-L) - clean completed successfully.
2008/07/31 06:39:35 VCS WARNING V-16-2-13140 Thread(4160) Could not find timer entry with id 82148
2008/07/31 06:39:35 VCS ERROR V-16-2-13068 Thread(4160) Resource(SG-Navision-Mount-K) - clean completed successfully.
2008/07/31 06:39:35 VCS WARNING V-16-2-13140 Thread(3744) Could not find timer entry with id 82147
2008/07/31 06:39:35 VCS ERROR V-16-2-13068 Thread(3744) Resource(SG-Navision-Mount-I) - clean completed successfully.
2008/07/31 06:57:52 VCS ERROR V-16-2-13120 Thread(4544) Error receiving from the engine. Agent(MountV) is exiting.
2008/07/31 07:03:03 VCS ERROR V-16-10051-9002 MountV:Could not initialize Volume Manager connection. Error = FFFFFFFF
2008/07/31 07:03:03 VCS ERROR V-16-10051-9002 MountV:SG-Navision-Mount-N:monitor:Could not initialize Volume Manager connection. Error = FFFFFFFF
2008/07/31 07:03:03 VCS ERROR V-16-10051-9002 MountV:SG-Navision-Mount-P:monitor:Could not initialize Volume Manager connection. Error = FFFFFFFF
2008/07/31 07:03:03 VCS ERROR V-16-10051-9002 MountV:SG-Navision-Mount-J:monitor:Could not initialize Volume Manager connection. Error = FFFFFFFF

Captain_Veritas
Level 3
Employee

Hrm,  you have something whacky going on here.   At one point, you have a resource that WON'T go offline, and elsewhere in the log you have the same resource that goes offline unexpectedly.

I also noticed that the MountV resource could not get Volume information at one point.

This is a fairly serious situation, so you should call Tech Support and have them help you run a vxexplorer to determine what's going on with your storage.   Was the SAN team working on the fabric when this failure occured?

Hywel_Mallett
Level 6
Certified

I see you have SFWHA 5.0. Have you applied MP1a at all? When 5.0 came out we had a terrible time with MountV resources going offline unexpectedly (when they weren't).

MP1a cured it (eventually)

Christian_Mark
Level 3
Partner Accredited Certified

Thanks for your response. I did patch the systems to 5.0 RP1a. Which seems to have solved the issues with the MountV resources going offline.

 

However after the RP1a was installed the Java Cluster Manager GUI suddenly showed too many attributes. This was solved by applying http://support.veritas.com/docs/302583.

 

Everything seems to be working fine now.

Hywel_Mallett
Level 6
Certified
I'd forgotten about that small extra fix that was needed. At least that's an easy one that doesn't affect stability. HTH :)

Pranali
Level 3
HI All,

  today morning I have got the following error messages in engine log file for VCS4.0MP2
2009/09/25 10:49:45 VCS ERROR V-16-2-13067 (sbimum66) Agent is calling clean for resource(oracle_sbi) because the resource became OFFLINE unexpectedly, on its own.
2009/09/25 10:49:45 VCS WARNING V-16-20002-23 (sbimum66) Oracle:oracle_sbi:clean:Oracle database bvsbi1 not running
2009/09/25 10:49:46 VCS INFO V-16-2-13068 (sbimum66) Resource(oracle_sbi) - clean completed successfully.
2009/09/25 10:49:46 VCS INFO V-16-1-10307 Resource oracle_sbi (Owner: unknown, Group: sbivcsfs-rg) is offline on sbimum66 (Not initiated by VCS)
2009/09/25 10:49:46 VCS NOTICE V-16-1-10300 Initiating Offline of Resource Netlsnr_bvsbi1 (Owner: unknown, Group: sbivcsfs-rg) on System sbimum66
2009/09/25 10:49:46 VCS INFO V-16-6-15004 (sbimum66) hatrigger:Failed to send trigger for resfault; script doesn't exist
2009/09/25 10:50:01 VCS INFO V-16-20002-40 (sbimum66) Netlsnr:Netlsnr_bvsbi1:offline:lsnrctl returned the following output


I m suspecting the problem from Oracle side because gitt the error in oracle.log.A file as
2009/09/25 10:49:45 VCS WARNING V-16-20002-207 Oracle:oracle_sbi:monitor:Open for ora_pmon failed, setting cookie to null
2009/09/25 10:55:13 VCS NOTICE V-16-20002-210 Oracle:oracle_sbi:monitor:Setting cookie for proc = ora_pmon_bvsbi1, PID = /proc/23227/psinfo
2009/09/25 10:55:13 VCS WARNING V-16-20002-207 Oracle:oracle_sbi:monitor:Open for ora_smon failed, setting cookie to null
2009/09/25 10:55:13 VCS NOTICE V-16-20002-210 Oracle:oracle_sbi:monitor:Setting cookie for proc = ora_smon_bvsbi1, PID = /proc/23261/psinfo
2009/09/25 10:55:13 VCS WARNING V-16-20002-207 Oracle:oracle_sbi:monitor:Open for ora_lgwr failed, setting cookie to null
2009/09/25 10:55:13 VCS NOTICE V-16-20002-210 Oracle:oracle_sbi:monitor:Setting cookie for proc = ora_lgwr_bvsbi1, PID = /proc/23257/psinfo
2009/09/25 10:55:13 VCS WARNING V-16-20002-207 Oracle:oracle_sbi:monitor:Open for ora_dbw0 failed, setting cookie to null
2009/09/25 10:55:13 VCS NOTICE V-16-20002-210 Oracle:oracle_sbi:monitor:Setting cookie for proc = ora_dbw0_bvsbi1, PID = /proc/23251/psinfo
2009/09/25 10:55:13 VCS WARNING V-16-20002-207 Oracle:oracle_sbi:monitor:Open for ora_lmon failed, setting cookie to null


Anybody is having any idea...