Highlighted

VVR stuck in Activating with blocks pending - need guidance please on how to resolve

Hi,
I have inherited a VVR solution running on a very old environment which has many primaries whose replication status is now stuck in Activating with many blocks pending. Unfortunately, no-one in the current IT team knows anything about VVR after the ones who implemented it left, so no maintenance work has been done to resolve these issues hence why we now have these problems.

Some information about the configuration. Since the project is coming to an end in the near term, there is no possibility of upgrading to MP2.

Primaries: Windows 2003 Sp2 x86, VSFW 4.3 MP1 - ROBO locations with 2Mb links

Secondaries: Windows 2003 Sp2 x86 VSFW 4.3 MP1 - Located in a Datacentre - is secondary for 32 primaries.

I have been tasked with getting all the Primaries that are reporting their Replication status as "Activating" back to active and getting the blocks pending to below 1000 across the board. So, to that affect, I am looking for some guidance as to where to start and what to do so as to be able to pass this information on to the rest of the team.

From what I understand, there are two types of buffer, SRL and DCM. This particular primary has a DCM of 33% with 28,000,000 blocks pending.

I appreciate more information will be needed to advise so if anyone can patiently advise what information they need to understand the problem and what information they need to help me understand how to resolve the issue I would be grateful.

Many thanks,

Chris

2 Replies

Re: VVR stuck in Activating with blocks pending - need guidance please on how to resolve

You need to sync DCM:

First get name of RVG and Diskgroup using:

vxprint -V

Then try:

vradmin -g <diskgroup> fbsync <rvg_name>

If this doesn't work on one node, try it on the other, and if doesn't work on either node, then try:

vradmin -g <diskgroup> resync <rvg_name>

Mike

Re: VVR stuck in Activating with blocks pending - need guidance please on how to resolve

Thanks Mike and apologies for the delay in responding.

Just to be certain (since no-one here really knows much at all about VVR) could I confirm the command to be used? I don't think we need to failback - just to get the secondary synced uptodate with the primary.

As background:

VXPrint -VPl output from Primary:


Diskgroup = BasicGroup
Diskgroup = EXC001_DG
Rvg : EXC001_DG_RVG
state : state=ACTIVE kernel=ENABLED
assoc : datavols=E:
srl=\Device\HarddiskDmVolumes\EXC001_DG\EXC001_SRL
rlinks=rlk_VVR459_8443
att : rlinks=rlk_VVR459_8443
checkpoint :
flags : primary enabled attached read write autosync resync_paused

Rlink : rlk_VVR459_8443
info : timeout=500 packet_size=1400
latency_high_mark=10000 latency_low_mark=9950
bandwidth_limit=none
state : state=ACTIVE
synchronous=off latencyprot=off srlprot=autodcm
assoc : rvg=EXC001_DG_RVG
remote_host=VVR459
remote_dg=EXC001_DG
remote_rlink=rlk_EXC001_18182
local_host=EXC001
protocol : UDP/IP
flags : write attached consistent disconnected autosync resync_paused

VXPrint -VPl output from Secondary:

Diskgroup = EXC001_DG
Rvg : EXC001_DG_RVG
state : state=ACTIVE kernel=ENABLED
assoc : datavols=\Device\HarddiskDmVolumes\EXC001_DG\Data
srl=\Device\HarddiskDmVolumes\EXC001_DG\EXC001_SRL
rlinks=rlk_EXC001_18182
att : rlinks=rlk_EXC001_18182
checkpoint :
flags : secondary enabled attached read write

Rlink : rlk_EXC001_18182
info : timeout=500 packet_size=1400
latency_high_mark=10000 latency_low_mark=9950
bandwidth_limit=none
state : state=ACTIVE
synchronous=off latencyprot=off srlprot=off
assoc : rvg=EXC001_DG_RVG
remote_host=EXC001
remote_dg=EXC001_DG
remote_rlink=rlk_VVR459_8443
local_host=VVR459
protocol : UDP/IP
flags : write attached consistent disconnected

We have tried dissociating the Primary's SRL Replicator log via VEA on the Secondary, waiting 30 seconds and then associate replica log (pointing it to the SRL volume) and then starting replication. However, we don't see any "Link for secondary VVR459 disconnected" message in the console - just messages saying "Removed log from RVG EXC001_DG_RVG successfully" and "Replication stopped on Secondary VVR459" followed by "Added Volume EXC001_SRL as Replicator Log to RVG EXC001_DG_RVG" and "Secondary VVR459 is ready to receive data". The icon still shows a "pause" symbol for the secondary RVG.

Stopping and starting replication doesn't seem to get the flag "Resync_Paused" removed and the link remains in "Activating"

I don't want to fail anything over - I just want to get the secondary to catch up with the primary and to get the link to go from "Activating" to "Active" so that blocks pending reduces to 0.

Would the command you suggested "vradmin -g <diskgroup> resync <rvg_name>" fix this in this example and would that command need to be run on the primary / secondary / both / either?

Many thanks,
Chris