DRL not working on mirrored volumes in VVR - RVG (...

martinfrancis · ‎05-30-2023

I think I am hitting a major issue here with a mirrored volume in RVG.
SRL is supposed to provide the DRL functionality . Hence DRL logging is explicitly disabled when a volume is added to RVG. However my testing shows that DRL is not working and in the case of a mirror plex out of sync due to a server crash etc, full resync of mirror plexes is happening. (not just the dirty regions).
Here is a quick and easy way to recreate the issue:

My configuration: Infoscale 8 Redhat 8.7
I have a mirrored volume sourcevol2 (2 plexes) which I created like below:
#vxassist -g dg1 make sourcevol2 1g logtype=dco drl=on dcoversion=20 ndcomirror=1 regionsz=256 init=active
#vxassist -b -g dg1 mirror sourcevol2
I wait for the synchronization to complete
#/opt/VRTS/bin/mkfs -t vxfs -o nomaxlink /dev/vx/rdsk/dg1/sourcevol2
# mount /dev/vx/dsk/dg1/sourcevol2 /sourcevol2

I create SRL as below:
#vxassist -g dg1 make dg1_srl 1g layout=concat init=active

I create primary rvg as below:
#vradmin -g dg1 createpri dg1_rvg sourcevol2 dg1_srl
Verified dcm in dco flag is on.
#vxprint -g dg1 -VPl dg1_rvg |grep flag
flags: closed primary enabled attached bulktransfer dcm_in_dco

Added secondary
#vradmin -g dg1 addsec dg1_rvg primarynode1 primarynode2
Started initial replication
#vradmin -g dg1 -a startrep dg11_rvg primarynode2
Verified replication is uptodate
#vxrlink -g dg1 -T status rlk_primarynode2_dg1_rvg
VxVM VVR vxrlink INFO V-5-1-4467 Rlink rlk_primarynode2_dg1_rvg is up to date

Here is the actual scenario to simulate mirror plexes out of sync :
On primary:
Run a DD command to put some IO on sourcevol2
#dd if=/dev/zero of=/sourcevol2/8krandomreads.0.0 bs=512 count=1000 oflag=direct
In another terminal , force stop the sourcevol2 while dd is going on.
#vxvol -g dg1 -f stop sourcevol2
#umount /sourcevol2
Start the sourcevol2
#vxvol -g dg1 start sourcevol2
#vxtask -g dg1 list -l
Task: 160 RUNNING
Type: RDWRBACK
Operation: VOLSTART Vol sourcevol2 Dg dg1

Even though I only changed only few regions on the sourcevol2 (sequential writes of 512b), the volume goes through a full plex resync (as indicated by the time to start the volume).

Summary:
DRL on a Volume added to an RVG is not working . Hence mirrored volumes are going through a full plex resync as opposed to only resync of dirty regions.

martinfrancis · ‎06-05-2023

any updates or ideas?

frankgfan · ‎06-06-2023

What was the mirror volume log plex state after the volume sourcevol2 was added to the RVG?

martinfrancis · ‎06-06-2023

Mirrored volume can use DRL feature by either using a log-plex OR using DCO for DRL logging.
So, when using DCO for DRL there WILL NOT be a separate log plex. But DRL will be included in the DCO maps.
Pls see below the steps I followed to create the mirrored volume, enable DRL and add it to RVG

Create Data volume specifying drl=on so it create drl in dco maps
#vxassist -g dg1 make sourcevol2 1g logtype=dco drl=on dcoversion=20 regionsz=256 init=active alloc=disk1,disk2
#vxassist -g dg1 mirror sourcevol2 disk3
At this point DRL is working. I was able to simulate a unclean stop of volume and vxvol start only sync up the dirty regions. So all good until this point. However once the volume is added to RVG, DRL function stops working.
Create SRL #vxassist -g dg1 make dg1_srl 1g layout=concat init=active alloc=disk4

Create primary RVG #vradmin -g dg1 createpri dg1_rvg sourcevol2 dg1_srl
The vradmin command output a message saying DRL will be explicitly turned off . This is well documented in the Infoscale replication admin guide -Quote "In addition to the replication functionality, the SRL provides the functionality provided by the DRL (Dirty Region Log). Therefore, VxVM DRL logging is explicity disabled when a
volume is added to an RVG"

However the DRL functionality simply does not work. The same unclean volume stop test I did previously, upon startup on volume sync up the entire mirror plex [not just dirty regions]

sdighe1 · ‎06-06-2023

Hi Martin,

This is expected behaviour. You already saw that DRL logging is disabled when volume is added in RVG. Since the volume was written (marked DIRTY) and unclean closed, code assumes that volume needs a recovery.

Please note here, SRL is sort of replacing DRL recovery functionality but in your case you have forcefully stopped data volumes and SRL/RVG is still in active state. Which means no RVG recovery is going to happen here and hence no SRL to data-vol recovery to make pending data blocks consistent. That is why code assumed that RWBK recovery is needed on volume and without DRL tracking code has to be perform full recovery.

The only place ideally DRL recovery is not needed in this cases in when RVG recovery happens i.e. all pending data from SRL to DV is flushed, data volume states can be changed. But in above testing I dont think you have tested exactly that and hence full sync seems to be expected behaviour.

I will still check more on this design and implementation part to double confirm.

martinfrancis · ‎06-06-2023

Do you think DRL style recovery (only dirty regions synced up) will happen if I simulate an actual crash scenario of the Linux host?

-- Can you pls confirm as per design an unclean server shutdown/crash will NOT do a full resync of plexes?

I am not sure how a crash/unclean shutdown will be different though. Because as part of bringing up the OS, the startup scripts does a vxrecover and vxvol start before an RVG recovery/bring up (unless there is a flag somewhere which says this volume is part of RVG and hence skip normal volume start and recovery)

My production volumes will be 2-3TBs in capacity mirrored volumes in RVG and the last thing I want to do is do a full mirror plex resync after an unclean shutdown. This is why I am working my test case around this scenario

VOX

DRL not working on mirrored volumes in VVR - RVG (Mirrored volume doing full resync of plexes)