07-21-2011 11:15 PM
Hi,
I have 1 simple question, my VCS was no DiskArray case, I changed 2 heartbeat IP address to a new one, VCS working properly, but can't check disk status, it shows:
#vradmin -g ligdg repstatus ligrvg
VxVM VVR vradmin ERROR V-5-52-82 Cannot communicate with vradmind server
When I change back the 2 heartbeat IP address, I found it working again. I think there must be a relation between heartbeat IP and vradmin program.
My question is how to change 2 heartbeat IP and at the same time let vradmin can works.
thanks.
Solved! Go to Solution.
07-22-2011 01:32 AM
Mike is right if you're referring to LLT heartbeats; from what you've said it sounds like you might have two clusters and you're trying to change the heartbeat/Cluster IP addresses used to communicate between the two clusters?
If you need to check the addresses used by VVR to verify if they need to be changed, use the following on each side to display the rlink information:
# vxprint -PVl ### uppercase P, uppercase V, lowercase L
see the following technote for example output: http://www.symantec.com/business/support/index?page=content&id=TECH52571
If the VVR IPs do need to be changed, refer to "VVR Administrator's Guide -> Administering Replication -> Changing the IP addresses used for replication" for the procedure/steps.
Link to VVR 5.1 (Solaris) version of the document here: https://sort.symantec.com/public/documents/sfha/5.1/solaris/productguides/html/vvr_admin/ch06s03s04.htm [Click Next at bottom right to go through to the prerequisites/steps/examples]
or if you're using a different version/platform, see https://sort.symantec.com/documents and select the Product Guide for the relevant platform/version
07-22-2011 01:12 AM
Are you changing a VCS heartbeat IP and if so are you using LLT over UDP? If you are not using LLT over UDP, then you don't need IPs on the VCS heartbeats.
My guess is that this is a routing issue so that when you change heartbeat IP, the replication network no longer works - to test this, change heartbeats again and then test you can ping VVR secondary replication IP from the VVR primary and vice versa.
Mike
07-22-2011 01:32 AM
Mike is right if you're referring to LLT heartbeats; from what you've said it sounds like you might have two clusters and you're trying to change the heartbeat/Cluster IP addresses used to communicate between the two clusters?
If you need to check the addresses used by VVR to verify if they need to be changed, use the following on each side to display the rlink information:
# vxprint -PVl ### uppercase P, uppercase V, lowercase L
see the following technote for example output: http://www.symantec.com/business/support/index?page=content&id=TECH52571
If the VVR IPs do need to be changed, refer to "VVR Administrator's Guide -> Administering Replication -> Changing the IP addresses used for replication" for the procedure/steps.
Link to VVR 5.1 (Solaris) version of the document here: https://sort.symantec.com/public/documents/sfha/5.1/solaris/productguides/html/vvr_admin/ch06s03s04.htm [Click Next at bottom right to go through to the prerequisites/steps/examples]
or if you're using a different version/platform, see https://sort.symantec.com/documents and select the Product Guide for the relevant platform/version
07-22-2011 02:07 AM
thanks very much!
1 more question is how to make recovery in hostA? I check from hostB,it tell me need recovery hostA,but when I recovery from hostA, it still wrong, see the following:
[root@zmss2ligB dsk]# vradmin -g ligdg repstatus ligrvg
Replicated Data Set: ligrvg
Primary:
Host name: zmss2ligB
RVG name: ligrvg
DG name: ligdg
RVG state: enabled for I/O
Data volumes: 2
SRL name: srl
SRL size: 25.00 G
Total secondaries: 1
Secondary:
Host name: zmss2ligA
RVG name: ligrvg
DG name: ligdg
Data status: N/A (needs recovery)
Replication status: not replicating (secondary needs recovery)
Current mode: asynchronous
Logging to: DCM (contains 112448 Kbytes) (failback logging)
Timestamp Information: N/A
[root@zmss2ligA ~]# vxprint -v
Disk group: ligdg
TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0
v ligvol ligrvg ENABLED 125829120 - ACTIVE - -
v oradatavol ligrvg ENABLED 41943040 - ACTIVE - -
v srl ligrvg ENABLED 52428800 SRL ACTIVE - -
[root@zmss2ligA ~]# vxrecover -g ligdg -sb
VxVM VVR vxrvg ERROR V-5-1-5268 RVG ligrvg cannot be recovered because SRL is not accessible. Try recovering the RVG after the SRL becomes available using vxrecover -s command
VxVM VVR vxrlink ERROR V-5-1-3370 Can not recover rlk_zmss2ligB_ligrvg until ligrvg is recovered
07-22-2011 02:11 AM
You need to use
vradmin -g ligdg fbsync ligrvg
I think you need to run from primary, but if it doesn't work, run from secondary
Mike
07-22-2011 02:39 AM
Neither of hostA hostB can sync. see the following:, I think there maybe have other strong sync command can work, can you give some advice?
[root@zmss2ligA ~]# vradmin -g ligdg fbsync ligrvg
Message from Primary:
VxVM VVR vradmin ERROR V-5-52-491 Cannot perform the operation: zmss2ligA is not a Primary (acting secondary).
[root@zmss2ligB dsk]# vradmin -g ligdg fbsync ligrvg
Message from Primary:
VxVM VVR vradmin ERROR V-5-52-491 Cannot perform the operation: zmss2ligA is not a Primary (acting secondary).
07-22-2011 02:52 AM
If fbsync is not working then you can try resync (again I think from Primary, but you can try secondary as well) - so command is:
vradmin -g ligdg resync ligrvg
Note resync does NOT resync from scratch, both fbsync and resync play back writes in the DCM.
If this doesn't work, please provide output of
vxprint -VPl
from both nodes
Mike
07-22-2011 03:02 AM
I run resync from hostB is ok, but actually hostA still need recovery, pls see the following:
[root@zmss2ligB dsk]# vxrlink -g ligdg -r ligrvg -i 1 status rlk_zmss2ligA_ligrvgFri Jul 22 17:55:04 CST 2011
VxVM VVR vxrlink INFO V-5-1-4348 DCM is in use on rlink rlk_zmss2ligA_ligrvg. DCM contains 112448 Kbytes.
VxVM VVR vxrlink INFO V-5-1-4348 DCM is in use on rlink rlk_zmss2ligA_ligrvg. DCM contains 112448 Kbytes.
VxVM VVR vxrlink INFO V-5-1-4348 DCM is in use on rlink rlk_zmss2ligA_ligrvg. DCM contains 112448 Kbytes.
[root@zmss2ligB dsk]# vxprint -VPl
Disk group: ligdg
Rlink: rlk_zmss2ligA_ligrvg
info: timeout=500 packet_size=1400 rid=0.1069
latency_high_mark=10000 latency_low_mark=9950
bandwidth_limit=none
state: state=ACTIVE
synchronous=off latencyprot=off srlprot=autodcm
assoc: rvg=ligrvg
remote_host=zmss2ligA IP_addr=192.168.12.1 port=4145
remote_dg=ligdg
remote_dg_dgid=1309289382.6.zmss2ligA
remote_rvg_version=21
remote_rlink=rlk_zmss2ligB_ligrvg
remote_rlink_rid=0.1073
local_host=zmss2ligB IP_addr=192.168.12.2 port=4145
protocol: UDP/IP
flags: write enabled attached consistent disconnected asynchronous dcm_logging resync_started
Rvg: ligrvg
info: rid=0.1066 version=0 rvg_version=21 last_tag=2
state: state=ACTIVE kernel=ENABLED
assoc: datavols=oradatavol,ligvol
srl=srl
rlinks=rlk_zmss2ligA_ligrvg
att: rlinks=rlk_zmss2ligA_ligrvg
flags: closed primary enabled attached dcm_logging resync_started
device: minor=65531 bdev=199/65531 cdev=199/65531 path=/dev/vx/dsk/ligdg/ligrvg
perms: user=root group=root mode=0600
[root@zmss2ligA ~]# vxprint -VPl
Disk group: ligdg
Rlink: rlk_zmss2ligB_ligrvg
info: timeout=500 packet_size=1400 rid=0.1073
latency_high_mark=10000 latency_low_mark=9950
bandwidth_limit=none
state: state=ACTIVE
synchronous=off latencyprot=off srlprot=autodcm
assoc: rvg=ligrvg
remote_host=zmss2ligB IP_addr=192.168.12.2 port=4145
remote_dg=ligdg
remote_dg_dgid=1309282492.6.zmss2ligB
remote_rvg_version=unknown
remote_rlink=rlk_zmss2ligA_ligrvg
remote_rlink_rid=0.1069
local_host=zmss2ligA IP_addr=192.168.12.1 port=4145
protocol: UDP/IP
flags: write disabled attached consistent disconnected needs_recovery
Rvg: ligrvg
info: rid=0.1066 version=0 rvg_version=21 last_tag=2
state: state=CLEAN kernel=RECOVER
assoc: datavols=oradatavol,ligvol
srl=srl
rlinks=rlk_zmss2ligB_ligrvg
att: rlinks=rlk_zmss2ligB_ligrvg
flags: closed secondary enabled detached needs_recovery log_access_err
device: minor=65531 bdev=199/65531 cdev=199/65531 path=/dev/vx/dsk/ligdg/ligrvg
perms: user=root group=root mode=0600
[root@zmss2ligA ~]# vradmin -g ligdg repstatus ligrvg
Replicated Data Set: ligrvg
Primary:
Host name: zmss2ligB
RVG name: ligrvg
DG name: ligdg
RVG state: enabled for I/O
Data volumes: 2
SRL name: srl
SRL size: 25.00 G
Total secondaries: 1
Secondary:
Host name: zmss2ligA
RVG name: ligrvg
DG name: ligdg
Data status: N/A (needs recovery)
Replication status: not replicating (secondary needs recovery)
Current mode: asynchronous
Logging to: DCM (contains 112448 Kbytes) (failback logging)
Timestamp Information: N/A
07-22-2011 03:11 AM
From the VVR 5.1 Administrator's Guide - vradmin error messages p367:
V-5-52-491: Cannot perform the operation: host is not a Primary (acting secondary).
The vradmin fbsync command requires that the specified host be an acting Secondary.
Action: Check the vradmin printrvg output to see whether the host host is an acting Secondary.
ie: please also include the output of
vradmin -g ligdg -l printrvg ligrvg
07-22-2011 03:11 AM
Can you also sent output of "vxprint -VP" (i.e without -l - long listing) from both nodes.
Mike
07-22-2011 03:15 AM
[root@zmss2ligB dsk]# vxprint -VP
Disk group: ligdg
TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0
rl rlk_zmss2ligA_ligrvg ligrvg ENABLED - - ACTIVE - -
rv ligrvg - ENABLED - - ACTIVE - -
[root@zmss2ligA ~]# vxprint -VP
Disk group: ligdg
TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0
rl rlk_zmss2ligB_ligrvg ligrvg RECOVER - - ACTIVE - -
rv ligrvg - RECOVER - - CLEAN - -
07-22-2011 03:19 AM
[root@zmss2ligB dsk]# vradmin -g ligdg -l printrvg ligrvg
Replicated Data Set: ligrvg
Primary:
HostName: zmss2ligB <localhost>
RvgName: ligrvg
DgName: ligdg
datavol_cnt: 2
srl: srl
RLinks:
name=rlk_zmss2ligA_ligrvg, detached=off, synchronous=off
Secondary:
HostName: zmss2ligA
RvgName: ligrvg
DgName: ligdg
datavol_cnt: 2
srl: srl
RLinks:
name=rlk_zmss2ligB_ligrvg, detached=off, synchronous=off
[root@zmss2ligA ~]# vradmin -g ligdg -l printrvg ligrvg
Replicated Data Set: ligrvg
Primary:
HostName: zmss2ligB
RvgName: ligrvg
DgName: ligdg
datavol_cnt: 2
srl: srl
RLinks:
name=rlk_zmss2ligA_ligrvg, detached=off, synchronous=off
Secondary:
HostName: zmss2ligA <localhost>
RvgName: ligrvg
DgName: ligdg
datavol_cnt: 2
srl: srl
RLinks:
name=rlk_zmss2ligB_ligrvg, detached=off, synchronous=off
07-22-2011 03:21 AM
Ok - try following:
From secondary (zmss2ligA) run:
vxrvg-g ligdg recover ligrvg
Then send output of "vxprint -VP" from both nodes again.
Mike
07-22-2011 03:24 AM
ligrvg in zmss2ligA is offline, I try to online, but it became faulted.
[root@zmss2ligA ~]# vxrvg-g ligdg recover ligrvg
bash: vxrvg-g: command not found
[root@zmss2ligA ~]# vxrlink -g ligdg recover rlk_zmss2ligB_ligrvg
VxVM VVR vxrlink ERROR V-5-1-3370 Can not recover rlk_zmss2ligB_ligrvg until ligrvg is recovered
[root@zmss2ligA ~]#
07-22-2011 03:29 AM
Sorry should have been space between vxrvg and "-g" so try:
vxrvg -g ligdg recover ligrvg vxrlink -g ligdg recover rlk_zmss2ligB_ligrv g
07-22-2011 03:32 AM
[root@zmss2ligA ~]# vxrvg -g ligdg recover ligrvg
VxVM VVR vxrvg ERROR V-5-1-5268 RVG ligrvg cannot be recovered because SRL is not accessible. Try recovering the RVG after the SRL becomes available using vxrecover -s command
07-22-2011 03:38 AM
Looks like SRL is not started run "vxprint -v" to confirm and if srl is not ENABLED then run:
vxrecover -sg ligrvg
Which will start all volumes in the diskgroup and then run "vxprint -v" again to check all volumes are ENABLED - then try vxrvg and vxrlink recover again.
Mike
07-22-2011 03:43 AM
[root@zmss2ligA ~]# vxprint -v
Disk group: ligdg
TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0
v ligvol ligrvg ENABLED 125829120 - ACTIVE - -
v oradatavol ligrvg ENABLED 41943040 - ACTIVE - -
v srl ligrvg ENABLED 52428800 SRL ACTIVE - -
[root@zmss2ligA ~]# vxrecover -sg ligrvg
VxVM vxrecover ERROR V-5-1-607 Diskgroup ligrvg not found
[root@zmss2ligA ~]# vxdisk list
DEVICE TYPE DISK GROUP STATUS
sda auto:none - - online invalid
sdb auto:sliced ligdg01 ligdg online
[root@zmss2ligA ~]# vxrecover -sg ligdg
VxVM VVR vxrvg ERROR V-5-1-5268 RVG ligrvg cannot be recovered because SRL is not accessible. Try recovering the RVG after the SRL becomes available using vxrecover -s command
VxVM VVR vxrlink ERROR V-5-1-3370 Can not recover rlk_zmss2ligB_ligrvg until ligrvg is recovered
07-22-2011 03:50 AM
I don't understand why SRL is not accessible as it is in an ENABLED ACTIVE state - could possible try:
vxvol -g ligdg stop srl
vxvol -g ligdg start srl
vxrvg -g ligdg recover ligrvg
vxrlink -g ligdg recover rlk_zmss2ligB_ligrvg
07-22-2011 04:26 AM
It's same result, I can delete the rlk and do the whole disk recovery, because hostB is OK, can you give some advice or steps, thanks a lot.
[root@zmss2ligA ~]# vxrvg -g ligdg recover ligrvg
VxVM VVR vxrvg ERROR V-5-1-5268 RVG ligrvg cannot be recovered because SRL is not accessible. Try recovering the RVG after the SRL becomes available using vxrecover -s command
[root@zmss2ligA ~]# vxprint -v
Disk group: ligdg
TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0
v ligvol ligrvg ENABLED 125829120 - ACTIVE - -
v oradatavol ligrvg ENABLED 41943040 - ACTIVE - -
v srl ligrvg ENABLED 52428800 SRL ACTIVE - -