Forum Discussion

Ryan_H_'s avatar
Ryan_H_
Level 4
15 years ago

Primary server hangs with replication operations...

Hello,

I have SF 5.1 w/ SP1 installed on win 2008 w/ VVR option, (2 nodes, primary and secondary).  My replication had been working fine for last few days but for last week end I am facing an issue with my primary server.

Whenever I tries to pause or stop or start the replication my server gets hang and I have to do hard reset by power off the system. There is no issue with the replication itself. but we are unable to do replication operations from the primary server.

Any one have idea, pls share wit me.. thnks...

Regards,
Syed.
  • If you are experiencing this issue you will see events in the system event log.  The command eventually succeeds but is waiting for the rlink to disconnect (hence the "workaround" of disconnecting the network).

    "Some time" (the incident shows 30 mins) after trying the command you will see vxio events in the system event log to say the RLINK was disconnected.

    such as:

    WARNING      Event 99 vxio <server> RLINK <your_rlink_name> disconnected from remote

    and also at this time to this will be a retry

    such as:

    WARNING Event 134 vxio <server> Disconnecting RLINK <your_rlink_name> as retry count exceeded 200
     

    If you can wait to see if these messsages appear, then you are hitting this known issue.  If you are not, or you get a failure message after 1-2 mins, then you have a different issue.

    I can't see where this issue hangs the server as above, I'd suspect that is not this issue.

    This issue would apply to any VVR operation (eg, expand volume, pause, resume) so if you are executing from the secondary site you may hit this issue.

    James.

  • Have you had a look at Event Viewer Application Log as well as System log yet?
  • Yes, I checked on both servers but nothing is there,

    When ever i try to pause the replication my primary server becomes hang,

    Reards,
    Ryan

  • Seems this is a 'Known Issue'. Extract from Release Notes:
    Known Issues -> Veritas Volume Replicator
    p.96:
    Pause and Resume commands take a long time to complete (495192)
    At times, the pause and resume operation can take a long time to complete due to which it appears to be hung.

    Workaround: Wait for some time till the operation completes, or manually disconnect and reconnect the network that is used for communication to enable the operation to complete.

  • Hello,

    thanks for the reply,

    manually disconnect means removing the nework cable from the server, this is the live server I have, and this operation will cause users to stop working.....

  • It didn't help. any way This is not a solution at all, How can you disconnect the cable for a live server.
  • It is good practice (not required) to have dedicated NIC's for replication.
    The alternative in the suggested workaround is to "Wait for some time till the operation completes".

    Maybe log a call with Symantec Support and tell them that you're not happy with their documented workaround?
  • Hi,

    Once server got hang, you can't do anything, Even I wait for 30 minutes but there is nothing to do with...

    BTW, you can ping the server but the screen got hanged & all desktop icons & open window disappered with this.

    BR,
    R. H.
  • Please log a support call. Symantec will look at explorer output and might finding something more serious than the 'known issue' that's documented in the Release Notes.
  • Does this known issue also apply when you choose to "pause secondary from primary" when logged onto the secondary site?

     

    Thanks - Mark

  • If you are experiencing this issue you will see events in the system event log.  The command eventually succeeds but is waiting for the rlink to disconnect (hence the "workaround" of disconnecting the network).

    "Some time" (the incident shows 30 mins) after trying the command you will see vxio events in the system event log to say the RLINK was disconnected.

    such as:

    WARNING      Event 99 vxio <server> RLINK <your_rlink_name> disconnected from remote

    and also at this time to this will be a retry

    such as:

    WARNING Event 134 vxio <server> Disconnecting RLINK <your_rlink_name> as retry count exceeded 200
     

    If you can wait to see if these messsages appear, then you are hitting this known issue.  If you are not, or you get a failure message after 1-2 mins, then you have a different issue.

    I can't see where this issue hangs the server as above, I'd suspect that is not this issue.

    This issue would apply to any VVR operation (eg, expand volume, pause, resume) so if you are executing from the secondary site you may hit this issue.

    James.