Forum Discussion

Cbhatt's avatar
Cbhatt
Level 3
8 years ago

VVR stuck in Activating with blocks pending - need guidance please on how to resolve

Hi,
I have inherited a VVR solution running on a very old environment which has many primaries whose replication status is now stuck in Activating with many blocks pending. Unfortunately, no-one in the current IT team knows anything about VVR after the ones who implemented it left, so no maintenance work has been done to resolve these issues hence why we now have these problems.

Some information about the configuration. Since the project is coming to an end in the near term, there is no possibility of upgrading to MP2.

Primaries: Windows 2003 Sp2 x86, VSFW 4.3 MP1 - ROBO locations with 2Mb links

Secondaries: Windows 2003 Sp2 x86 VSFW 4.3 MP1 - Located in a Datacentre - is secondary for 32 primaries.

I have been tasked with getting all the Primaries that are reporting their Replication status as "Activating" back to active and getting the blocks pending to below 1000 across the board. So, to that affect, I am looking for some guidance as to where to start and what to do so as to be able to pass this information on to the rest of the team.

From what I understand, there are two types of buffer, SRL and DCM. This particular primary has a DCM of 33% with 28,000,000 blocks pending.

I appreciate more information will be needed to advise so if anyone can patiently advise what information they need to understand the problem and what information they need to help me understand how to resolve the issue I would be grateful.

Many thanks,

Chris

  • You need to sync DCM:

    First get name of RVG and Diskgroup using:

    vxprint -V

    Then try:

    vradmin -g <diskgroup> fbsync <rvg_name>

    If this doesn't work on one node, try it on the other, and if doesn't work on either node, then try:

    vradmin -g <diskgroup> resync <rvg_name>

    Mike

    • Cbhatt's avatar
      Cbhatt
      Level 3

      Thanks Mike and apologies for the delay in responding.

      Just to be certain (since no-one here really knows much at all about VVR) could I confirm the command to be used? I don't think we need to failback - just to get the secondary synced uptodate with the primary.

      As background:

      VXPrint -VPl output from Primary:


      Diskgroup = BasicGroup
      Diskgroup = EXC001_DG
      Rvg : EXC001_DG_RVG
      state : state=ACTIVE kernel=ENABLED
      assoc : datavols=E:
      srl=\Device\HarddiskDmVolumes\EXC001_DG\EXC001_SRL
      rlinks=rlk_VVR459_8443
      att : rlinks=rlk_VVR459_8443
      checkpoint :
      flags : primary enabled attached read write autosync resync_paused

      Rlink : rlk_VVR459_8443
      info : timeout=500 packet_size=1400
      latency_high_mark=10000 latency_low_mark=9950
      bandwidth_limit=none
      state : state=ACTIVE
      synchronous=off latencyprot=off srlprot=autodcm
      assoc : rvg=EXC001_DG_RVG
      remote_host=VVR459
      remote_dg=EXC001_DG
      remote_rlink=rlk_EXC001_18182
      local_host=EXC001
      protocol : UDP/IP
      flags : write attached consistent disconnected autosync resync_paused

      VXPrint -VPl output from Secondary:

      Diskgroup = EXC001_DG
      Rvg : EXC001_DG_RVG
      state : state=ACTIVE kernel=ENABLED
      assoc : datavols=\Device\HarddiskDmVolumes\EXC001_DG\Data
      srl=\Device\HarddiskDmVolumes\EXC001_DG\EXC001_SRL
      rlinks=rlk_EXC001_18182
      att : rlinks=rlk_EXC001_18182
      checkpoint :
      flags : secondary enabled attached read write

      Rlink : rlk_EXC001_18182
      info : timeout=500 packet_size=1400
      latency_high_mark=10000 latency_low_mark=9950
      bandwidth_limit=none
      state : state=ACTIVE
      synchronous=off latencyprot=off srlprot=off
      assoc : rvg=EXC001_DG_RVG
      remote_host=EXC001
      remote_dg=EXC001_DG
      remote_rlink=rlk_VVR459_8443
      local_host=VVR459
      protocol : UDP/IP
      flags : write attached consistent disconnected

      We have tried dissociating the Primary's SRL Replicator log via VEA on the Secondary, waiting 30 seconds and then associate replica log (pointing it to the SRL volume) and then starting replication. However, we don't see any "Link for secondary VVR459 disconnected" message in the console - just messages saying "Removed log from RVG EXC001_DG_RVG successfully" and "Replication stopped on Secondary VVR459" followed by "Added Volume EXC001_SRL as Replicator Log to RVG EXC001_DG_RVG" and "Secondary VVR459 is ready to receive data". The icon still shows a "pause" symbol for the secondary RVG.

      Stopping and starting replication doesn't seem to get the flag "Resync_Paused" removed and the link remains in "Activating"

      I don't want to fail anything over - I just want to get the secondary to catch up with the primary and to get the link to go from "Activating" to "Active" so that blocks pending reduces to 0.

      Would the command you suggested "vradmin -g <diskgroup> resync <rvg_name>" fix this in this example and would that command need to be run on the primary / secondary / both / either?

      Many thanks,
      Chris